I talk too much as it is: Just ask my colleagues. The last thing they’d want is for me to sit at my desk dictating these columns out loud. Which is why I’m trying out the newest version of Nuance Corp.’s speech recognition software in an office just down the hall.
Burlington-based Nuance has just delivered the 13th edition of its Dragon NaturallySpeaking program, and it’s good stuff. NaturallySpeaking is highly accurate, runs well on even relatively slow computers, and uses the microphone built into many laptops so you don’t have to wear one of those silly-looking headphones. Yet it suffers from the usual problems of speech transcription software. It’s easily distracted by background noise, and its automatic punctuation feature all too often places the periods and commas in the wrong places.
And it’s a long way from cheap: $200 for the premium edition that I tested or $99 for a less advanced version for home users. I’d never spend that much to avoid the trouble of typing. Speech recognition has a brilliant future when applied to less-finger-friendly devices — our smartphones, cars, and thermostats, for instance. But for personal computers, software like NaturallySpeaking is an acquired taste that nobody’s managed to acquire.
After all, similar software has been around for quite a while now, and you can get plenty of it for free. A simple version of speech recognition is built into Microsoft Windows software and Mac OS X. If you use the Chrome browser from Google Inc., you can do Google searches with your voice instead of a keyboard. Just say “OK, Google” and ask your question. (Microphone not included.) But speech controls for personal computers have been popular mainly with people with disabilities or specialized technical users such as doctors dictating the results of medical tests. And despite its excellence, Dragon NaturallySpeaking isn’t likely to change this.
I loaded the massive 3-gigabyte program onto a two-and-a-half-year-old laptop running an Intel Corp. Core i3 processor, far from the most powerful of PC chips. No matter; there was a little lag before the text appeared, but on the whole, Dragon NaturallySpeaking ran just fine.
Early speech recognition programs had to be trained to recognize individual voices. You’d spend an hour or so reading “Huckleberry Finn” or some other famous book into the microphone. Today’s software still learns from experience, becoming more accurate over time. But there are no lengthy training sessions, just a brief tuneup so Dragon can adjust the microphone.
From day one, it translated my words with well over 95 percent accuracy and impressed me with its massive vocabulary. The software had no trouble with unusual names such as philosopher Soren Kierkegaard, or even journalist Hiawatha Bray. But although NaturallySpeaking 13 works with a laptop’s built-in mic, its accuracy suffered badly when I tried it. It might not be Nuance’s fault; a lot depends on the room acoustics and the laptop’s mic quality. A better laptop might deliver better results. But I found that an old-school headset with microphone worked best.
I also gave up on the software’s auto-punctuation feature, which guesstimates where periods and commas should go. It only occasionally gets things right, and it’s no help at all for colons, semi-colons, and such. Saying the word “period” or “question mark” produces better results, but it ruins the illusion of natural speech.
But then, natural speech usually makes for lousy reading. Ask me to tell you a story, and I babble. If you want something coherent, give me a keyboard. For lifelong typists, speech recognition software is a hindrance to good prose, rather than a help.
Lucky for Nuance that it’s a huge player in markets where speech recognition is truly useful — smartphones, for instance. The famous Siri voice-command system on Apple Inc.’s iPhone uses Nuance technology, and while it’s far from perfect, Siri’s a better option than the iPhone’s tiny keyboard. Google also makes an excellent speech-controlled personal assistant.
Mobile devices don’t have the processing power to do full-fledged speech processing; instead, they connect to massive data centers in the Internet cloud. But Intel is working on mobile chips powerful enough to do all the voice processing on the phone. That would mean pocket-sized devices with the power of NaturallySpeaking right on board — perfect for dictating notes and memos on the fly.
To me, it sounds like the ideal division of labor. Phones are for talking; computers are for typing.