Computer Counselor - January 1998
New Developments in Voice Dictation Software
Attorneys who would rather talk than type have software options
By Alan Adler and Daryl Teshima
Alan Adler is a principal in LexTech, Inc., a legal technology consulting firm, and Daryl Teshima is the editor of Law Office Computing, a bimonthly technology magazine.
Practical voice dictation software has long been the top request on many attorneys' technology wish list. Since most lawyers can speak faster than they can type, the ability to turn speech into word processing files should make drafting documents easier and quicker. Past voice dictation programs, unfortunately, have been disappointing. Older programs needed discrete speech, which required users to pause between words. Dictating a mere 60 words per minute-typing speed-entailed pausing 60 times per minute as well. In addition, the older voice recognition products required expensive, cutting-edge hardware and software.
The new generation of voice dictation software has overcome many of these problems. NaturallySpeaking Personal Edition 1.0 (Dragon Systems) and ViaVoice (IBM) both deliver true large-vocabulary continuous speech recognition, which means that users no longer have to pause while dictating. Better yet, the retail price of these products is less than $149.
Both products have steep hardware requirements, but the recent drop in the prices of memory and processors makes it easier for attorneys to run these programs effectively. The minimum requirement for NaturallySpeaking is a 133 MHz Pentium processor and 32 MB of RAM. ViaVoice's minimum requirements are slightly higher-32 MB of RAM and a 150 MHz Pentium MMX processor. As is true with other programs, the minimum requirements do not equate with practical use. With minimal configurations, these programs can be expected to run quite slowly. A faster processor (at least a 166 MHz Pentium MMX processor) and, more important, at least 64 MB of RAM can make a significant difference in both programs' performance. Although a computer with a 166 MHz Pentium chip and more than 64 MB RAM may exceed the computing requirements of most law firms, at least someone who obtains such a machine will not have to upgrade it for some time to come.
ViaVoice and NaturallySpeaking also only work in either Windows 95 or NT-they do not work with either Windows 3.1x or DOS. NaturallySpeaking occupies 62 MB of hard disk space, and ViaVoice consumes 75 MB. Each product includes an inexpensive noise-reducing microphone headset. A regular user, however, may want to replace the original headset with one that is more durable and comfortable. A CD-ROM drive is needed for installation, and both programs make use of SoundBlaster-compatible sound cards. ViaVoice also supports IBM's Mwave sound device. NaturallySpeaking supports other selected sound cards as well (contact Dragon Systems if your sound card is not a SoundBlaster).
Installation and Training
Before installing either program (installation is straightforward for both), there are several important guidelines to keep in mind. Although users no longer need to pause between each word, they still need to enunciate clearly. The microphone must also be properly positioned at all times, or accuracy will drop. More important, users must spend time training before they can get consistently reliable results.
For example, in a test after initial installation, both programs stumbled over the Preamble to the Constitution. NaturallySpeaking recognized "insure domestic tranquillity" as "in short domestic trend ability." ViaVoice recognized "in order to form a more perfect union" as "in order to formatted WordPerfect union." ViaVoice requires users to spend about five minutes dictating with the Dictation Trainer. However, this module only shows users how to use ViaVoice; it does not train the program to recognize the user's voice. To improve ViaVoice's recognition accuracy, users can use the "enrollment" module, which requires them to speak as many as 265 sentences. After users speak each sentence accurately, they are prompted with another sentence. ViaVoice then adjusts itself in accordance with the user's dictation style.
NaturallySpeaking requires initial training of only a few sentences as well, but users must complete a general training session before changes to a user's speech files can be saved. During a general training session, users read a series of sentences. As users read each sentence, recognized text changes from blue to black; when the program needs users to try again, an arrow indicates where to start. This process takes about 20 minutes. Users can perform further general training by dictating a number of supplemental texts. The more users train, the better the accuracy.
Both programs come with word processors similar to Microsoft's WordPad, the basic word processor included with Windows 95 and NT. Neither of these basic word processors rivals the feature set of either Word or WordPerfect. The basic word processors serve as intermediaries whose primary function is to convert speech into text. After dictating text in the basic word processor, users cut (or copy) and paste the text into their regular word processors. With NaturallySpeaking, users can accomplish this task with voice commands. With ViaVoice, users press a Transfer button on the tool bar or save the text in either Word 6, Rich Text Format (RTF), or text format. NaturallySpeaking supports RTF and text only.
ViaVoice also has the ability to dictate directly into Microsoft Word 95 and 97. Word integration is achieved with a Word template, which automatically loads each time users start Word. This process delays the use of Word whether or not ViaVoice is in use. The template adds a dictation toolbar and menu to start and stop dictation and correct errors. By right-clicking on text, users can choose to correct a recognition error or get help correcting text.
With ViaVoice, users begin dictating by clicking the Begin Dictation button on the dictation tool bar or menu. The program responds by saying, "Begin dictation." To stop dictation, users can say, "Stop dictation," or click the Stop Dictation button. Whenever the program stops listening, users will hear "dictation stopped." Both programs often lag several words behind the speaker, so these audible cues are quite helpful.
As the speaker dictates with ViaVoice, words appear in the text window as they are recognized. If the program does not understand a particular word, it displays that word in reverse text until it makes its best guess. The process often slows the speech conversion process, making it easy for users to lose track. ViaVoice also features two other dictation modes. Spell mode allows users to spell words, while number mode allows users to enter numbers a digit at a time.
To make a correction, users highlight the text containing the error and select Correct Error. The program plays back the snippet of speech it associates with the highlighted text and presents a list of alternative words, if any. Users can also just type the correct spelling, and they have the option of converting a numeral into word form or adding a phrase to the dictionary. Each correction helps the program not make the same mistake twice. Users can correct errors as they occur or at the end of the dictation session. Users can also save recorded speech in the document for correction at a later date. Even a three-paragraph document saved as recorded speech, however, is a huge file.
With NaturallySpeaking, users start dictating by clicking the microphone on the program's tool bar or by pressing the plus key on the numeric keypad. Users stop the program from listening by saying, "Go to sleep," or again pressing the plus key. Unlike ViaVoice, there are no audible cues.
As users dictate with NaturallySpeaking, the spoken words first appear in a yellow Results Box. The Results Box shows what phrases and words the program is converting. This is helpful in showing a dictation's current status. When finished, the program inserts the text into the document window.
NaturallySpeaking features terrific hands-free editing capabilities. To correct an error, users can simply say, "Correct that," to bring up a correction dialogue box. Users then choose from a numbered list of alternatives by saying, "Select," and the correct word's position on the list. Or users can spell the word. If NaturallySpeaking continues to have trouble, the user can click the Train button to train the program to distinguish between the word the user wanted and the word the program chose. Training produces better results than simply using the correction dialogue.
Accuracy and Editing
Testing revealed that the accuracy of both programs improved greatly over time. One reason is that the same mistake is never made twice. Although the programs are not 100 percent accurate right out of the box, a few weeks of consistent use will produce acceptable results. The way to improve accuracy is to build the programs' vocabularies.
Both products feature large dictionaries and the ability to add additional words and phrases based on a user's own particular needs and usage. ViaVoice starts with an active dictionary of 22,000 common words. Users can add up to 42,000 more words or phrases of their own. NaturallySpeaking features an active dictionary of 30,000 frequently used words plus a backup dictionary of 230,000 words. The program monitors word usage and moves words to and from the active dictionary. Users can add new words not found in either dictionary.
ViaVoice and NaturallySpeaking both feature ways of quickly customizing their dictionaries to reflect a user's particular vocabulary and speech patterns. For example, to use ViaVoice's Vocabulary Expander, users open a file and highlight the text they want evaluated. The program presents users with a list of words found in the document that are not in its dictionary. Users select and train the words they want added to the vocabulary. They can add up to 1,000 words at a time and can indicate whether the speech patterns reflect their normal usage; if so, the program also notes the speech patterns uncovered in its analysis.
NaturallySpeaking also features a fine vocabulary utility called Vocabulary Builder. As with ViaVoice, a user specifies documents containing words representative of the user's normal vocabulary. Whereas the ViaVoice utility processes one file at a time, Vocabulary Builder allows users to specify any number of files totaling up to 20,000 words. It will then scan the indicated text, allow users to add unknown words to the vocabulary, and analyze usage frequency and speech patterns.
Word processing in most law firms means more than simply typing text. Users need to take the text and put it into a memo, pleading, or letter. Neither product currently allows users to take complete advantage of a full-featured word processor's editing and formatting capabilities. However, there are several differences between the two programs' abilities in the area of hands-free editing.
With ViaVoice, users can dictate directly into Microsoft Word, but ViaVoice's editing and formatting features are quite limited. Users cannot select text with voice commands or apply formatting. Users also cannot navigate a document using voice commands. The only thing users can do by voice is capitalize characters and begin a new paragraph.
NaturallySpeaking, by comparison, is better at hands-free navigation and editing. The program's basic word processor gives users many of the editing and formatting tools they will need. Besides controlling capitalization and creating new paragraphs by voice commands, users can move the cursor by speaking commands such as "go to top" or "move down two paragraphs." The program also lets users select text by text unit (e.g., "select word" or "select paragraph"), by relative text unit (e.g., "select next word" or "select next 12 words"). Users can also select specific text by speaking the word "select" followed by the target text. This feature makes editing a breeze. Subsequent speaking replaces the selection. Users can also cut, copy, or delete selected text and change its font, size, alignment, case, or attributes. In short, users can quickly accomplish basic word processing tasks while rarely touching the keyboard or mouse.
ViaVoice and NaturallySpeaking demonstrate that science fiction continues to become science reality. With computer processing power getting cheaper every day, these programs give attorneys-especially technophobic ones-an alternative way to create documents. Although users should not throw away the keyboard yet, expect voice dictation software to play a larger and more important role in law firms in the years ahead.
Briefs & Bytes: Just before this column went to press, another continuous speech package-Typhoon Software's IBM ViaVoice for WordPerfect-emerged. Based on the IBM ViaVoice Gold engine, this speech product has all the capabilities of IBM's standard speech recognition product (reviewed above) and allows you to dictate directly into either WordPerfect 7 or 8. However, unlike IBM's ViaVoice integration with Microsoft Word, Typhoon software has voice-activated nearly every one of WordPerfect's toolbar, menu, and editing functions. You can also use speech to perform some basic Windows commands in other applications, including Novell GroupWise. Legal users can achieve even more accuracy by purchasing the Legal Edition of this package, which supplements the program's 22,000-word dictionary with an additional 20,000 legal terms.
The program's functionality comes at a price. Although the minimum hardware requirements are listed as a 166 MHz Pentium MMX processor and 32 MB of RAM, Typhoon Software recommends a 200 MHz Pentium MMX processor and 64 MB of RAM. For greater accuracy, company officials also recommend using a true Creative Labs SoundBlaster card. The package is also priced higher than Dragon's and IBM's standard speech package. Typhoon's standard package retails for $299, with the Legal Edition retailing for $449. For more information, call (805) 966-7633 or Typhoon's Web site: www.typhoon.com.