The Latest Trumors

by Beverly Rosenbaum

Can We Talk?

I don’t think so…..In this age of voice-activated gadgets, continuous voice recognition software is still a young technology. While speech recognition software products are rapidly improving and prices are dropping, accuracy (the most important feature) has barely reached 95 percent. While there is no doubt that significant improvement has been made over a short period of time and may be approaching an acceptable level, correcting mistakes is still painfully time-consuming. The point at which accuracy ceases to be an issue is at the 99 percent-or-better range, which is still a few years away.

PC Week Labs recently interviewed Ray Kurzweil, the founder of speech and sound synthesis company Kurzweil Applied Intelligence Inc. and chief technology officer for Lernout & Hauspie Speech Products USA Inc., which acquired Kurzweil’s company in 1997. Kurzweil said he believes that widespread, speaker-independent, continuous natural speech recognition is about five years away. “As more powerful computers become available, more sophisticated speech recognition algorithms will allow dictation programs to become more human-like in their response.” The current synthesized voice can be difficult to understand.

There are four major players in the speech software arena but it is virtually impossible to compare their products fairly because the feature sets vary sharply, as do the prices. Dragon NaturallySpeaking consistently delivered the best accuracy, ranging from 85 to 95 percent and averaging only 91 percent in PC Magazine tests. PC Computing agreed that the Dragon product was the most accurate, but endorsed IBM’s ViaVoice for its effortless combination of command control and dictation (overlooking its lower rate of accuracy). Testers agreed that ViaVoice’s accuracy was highly inconsistent, and that the program was better at giving orders than taking down what you said. The increase in productivity of voice software is linked to the person’s typing speed – obviously the improved throughput of dictation is more dramatic for a person who types only 40 words a minute. Voice Xpress Plus was second in accuracy, averaging 87 percent, followed by ViaVoice (85 percent) and FreeSpeech98 (80 percent).

Dragon Systems introduced its first-generation mainstream product in the spring of 1997. This small Massachusetts company led the way in hands-free computing with its discrete-speech DragonDictate program. IBM followed in late summer with ViaVoice. At that time, neither product went beyond word processing to Windows command and control. But since then, the companies have engaged in a full-fledged price and feature war as they have added to their product lines. Before the latest release of low-end entry-level products, full desktop programs of this type cost several hundreds of dollars.

With the introduction of ViaVoice Gold in November 1997, IBM brought out the first product that allowed dictation using continuous speech directly into all Windows applications. Dragon then released NaturallySpeaking Personal and NaturallySpeaking Preferred, which includes, among other new features, playback of your dictation for proofreading (adopting an IBM feature), and NaturallySpeaking Deluxe (which adds the full DragonDictate discrete speech program).

In summer 1998, IBM released ViaVoice 98, with three different versions: Home, Office and Executive, while Dragon now offers version 3 of NaturallySpeaking, in three forms: Personal, Preferred and Professional. The Dragon Preferred and ViaVoice98 Executive both sell for $150.

Lernout & Hauspie took over Kurzweil’s dictation line and introduced its own continuous speech products, Voice Xpress and Voice Xpress Plus, in late spring of 1998. Voice Xpress allows continuous speech dictation into its own text editor, while the Plus version ($70) adds dictation into Microsoft Word. Kurzweil’s early VoiceCommands program was the first to use “natural language commands” for voice-enabled command tools. This feature, now in Voice Xpress Plus as well as all version 3 releases of NaturallySpeaking (Preferred and Professional, through the NaturalWord utility) and in ViaVoice 98, lets you give natural phrases to order Microsoft Word what to do. The user can issue different phrases to carry out the same action, using the sort of speech that an actual human being would use. This is true techno-ergonomics: the software fits the user; the user doesn’t have to fit the software. For example, in Kurzweil VoiceCommands,you can select something and accomplish the same formatting result by saying commands like: “make this two points larger” or “increase the font size by two points” or “make the font bigger by two.”

The low-cost FreeSpeech98 from Philips ($39 direct) provides command and control of the computer plus continuous speech dictation. This limited product has no printed documentation. It does not support multiple users or dictionaries, nor does it support macros for commonly used phrases, as do products from all three of the other companies. While it is the only product that does not come with a microphone, Philips recommends its own SpeechMike ($69.95 direct).

System Requirements

Speech recognition software programs are very power-hungry. While the published range of adequate systems include Pentium processors from 133 to 200MHz with 32 megs of RAM (48 megs for NT), you may find yourself staring at the Windows hourglass even with a Pentium/266 and 64 megs of RAM. All of the applications need nearly 200MB of hard disk space, a 16-bit SoundBlaster-compatible sound card, and Windows 95, 98 or NT. Testing in PC Magazine Labs was done using a 300MHz Pentium II running Windows 98 with 128 megs of RAM and a Creative Labs SoundBlaster AWE64 Gold sound card.

Accuracy can be improved with dictating practice, and by completing a “training” process. All of the programs include a wizard to get started, and the training can take from 30 minutes to an hour. Unlike NaturallySpeaking and ViaVoice, Voice Xpress does not allow you to return to the speaker enrollment to fine-tune the speaker profile.

Microphones

The VXI Parrott 10-3 microphone bundled with NaturallySpeaking performs well, while the Andrea NC-80 included with ViaVoice and Voice Xpress is not as comfortable as the Parrott.

The Philips SpeechMike (sold separately) is a very interesting device, combining a good quality microphone, speaker and trackball into one hand-held device.

The Philips SpeechMike trackball has programmable buttons, which allows you to have different sets of programmed buttons that turn on automatically when you switch programs. For someone using a variety of applications, this can save a lot of steps. For instance, if you were entering numbers in an Excel spreadsheet you might want a button programmed to do the autosum function, another button to be the F2 for cell editing, another button to select a row. In an e-mail application, you might have one button programmed to send mail, another button to open the address book, another button to insert boilerplate text. Once set up this way, the Philips does all the work, switching sets as Windows switches focus. The Philips also works along with your existing mouse, trackpad or trackball.

One source for other microphones is Speech Controlled Computer Systems www.speechcontrol.com. They carry a high-quality Sennheiser MD431 II, available in a hand-held model or mounted in a stand. It has a mute switch, but no speaker, so you would need something else for playback or screen reading. Desktop speakers are fine if you have your own office. SCCS also carries the Andrea NC-600, a model offering a mute switch, good noise cancellation and a very stable and easy-to-move boom. Two new Andrea microphone products released in October include the PCTI (Personal Computer/ Telephone Interface) that allows you to switch between talking on the phone or dictating in your Speech Recogniton software, and the AWS-100 (Andrea Wireless Solution) Infrared (IR) model that brings untethered freedom of movement and high-quality voice transmission to speech recognition, audio/video conferencing and multimedia presentations.

Talking Technologies’ Talk Mic is extremely lightweight, and the D-frame of the ear piece spares your ear the constant pressure applied by other mikes, because having something press on your ear constantly can become uncomfortable. The Talk Mic provides good recognition and has a short boom, which helps put the mike element in the right place, to the side of the mouth. By contrast, the booms on the Parrott VXI mikes and the ANC-600 are long and may need bending outward and back in again - in a big C - to make sure the mike element is not in the breath path.

Plantronics offers the H51N telephone headset that can be used with the switch box. It’s comfortable and has noise cancellation, but the boom is stiff and hard to move. The non-noise-canceling version (H51) provides good results too; this is the model seen in offices everywhere, with a quick disconnect and a clear “voice tube” that can quickly swivel out of the way. If you need to switch between telephone and computer a lot, this mike is worth considering. The toggle switch on the switch box can be used as a mute, similar to the Parrot VXI mikes.

Versions of Naturally Speaking and ViaVoice have already found their way into office suites from Corel and Lotus. Lotus has recently released new SmartMaster templates for 1-2-3 that allow you to enter data into spreadsheets, which can be downloaded from www.lotus.com/ home.nsf/welcome/smartsuiteupdates. Microsoft has invested in Lernout and Hauspie’s Voice Xpress, which already offers tight integration with Word, so it’s clear that speech recognition will find its way into future versions of Office.

Dragon Systems also supplies Power Secretary for the Macintosh and IBM’s speech engine comes with OS/2.

You can download Microsoft’s SDK Suite, which includes the speech recognition engine and text-to-speech engine from microsoft.com/iit/. Microsoft’s unreleased dictation software for Windows 95 and NT 4.0 can be downloaded from: www.microsoft.com/msdownload/msdictation/01000.htm

Other interesting sources include the following for Internet browsing: Conversa Web browser for Internet Explorer 4.0, offers speakable hot links at www.conversa.com. Also, IBM’s VoiceType Connection for Netscape 3.0 comes with IBM’s Simply Speaking Gold but is also available for separate purchase at www.software.ibm.com/is/voicetype/vtconn.html and works with ViaVoice Gold and ViaVoice98.

The reach of speech dictation will ultimately be expanded into other applications. Dragon, in collaboration with Actioneer Inc., will incorporate its continuous-speech recognition technology in PIM (Personal Information Manager) products set for release soon, and Lernout & Hauspie plans to leverage its speech recognition technology for Web browsing, Internet and database searches, and e-mail. By the time you read this, L&H will release a Legal Edition of Voice Xpress as well as a Medical Edition for Cardiology. Ray Kurzweil described a “translating telephone” that would, with only a slight delay, translate from one language to another. He predicts that such a device will be available from Lernout & Hauspie in the next couple of years.

IBM research has also been working to recognize large vocabularies of continuous speech in the medical field. An example is Medspeak, used by radiologists to dictate medical reports into a computer.

In mid-1995 the IBM research team began collaborating with radiologists from two hospitals, Memorial Sloan-Kettering Cancer Center in New York and the Massachusetts General Hospital in Boston, to produce MedSpeak/Radiology, a real-time continuous-dictation product.

It consists of software and a special noise-canceling microphone (handheld or headset) that runs on a personal computer requiring a Pentium Pro 200 MHz processor. Traditionally, the radiologist (the physician responsible for interpreting X-rays, MRIs, and CAT scans) has dictated patient information into a tape, which is then transcribed. That information is then returned to the radiologist for final review and signature, all of which can take between several hours and several days.

Medspeak would allow radiologists to dictate, edit, and electronically sign their own reports in real time using a personal computer. This capability could reduce report turnaround time and transcription costs, while potentially increasing the confidentiality of patient notes and charts through reduced reliance on outside services.

However, accuracy still remains the greatest hurdle to be overcome before this type of product will be successful.

Check out more details about the speech recognition programs of Dragon Systems, www.dragonsys.com; IBM, www.software.ibm.com/is/voicetype; Lernout & Hauspie (Kurzweil), www.lhs.com; and Philips, www.freespeech98.com.

The bottom line for speech recognition software should be accuracy. If the software can’t decipher what you said correctly, nothing else matters. These applications remind me of scanners and early OCR (Optical Character Recognition) software a few years back.

People resorted to running a spellchecker to help overcome unacceptable accuracy rates for their scanned documents. If you are interested in programs that can do more than take basic dictation, you’ll find other productivity-enhancing features in these products. But if accuracy is what you’re looking for, they’re not quite ready for prime time.

Beverly Rosenbaum is a HAL-PC member who can be contacted at brosen@hal-pc.org.

E-mail me at webmaster@hal-pc.org with any comments you have and tell me what you want to see here.

Back to the Magazine Home Page