Computer Counselor - July/August 2000
The Speech Recognition Address
By Daryl Teshima
Daryl Teshima is a practice systems attorney at Gibson, Dunn & Crutcher. Computer Counselor
To test the accuracy of Via Voice and Naturally Speaking, I dictated Abraham Lincoln's Gettysburg Address. In the results below, missing punctuation is not signalled; errors are lined out. After I corrected the initial results in order to retrain the programs, I dictated the address repeatedly until I obtained acceptable results.
Naturally Speaking (Legal Suite 4.0), first try:
The said start and seven years ago are father is not forth upon this continent Indian nation, conceded liberty and dedicated to the proposition that all men are created equal.
They now engaged in a great similar or, testing whether that nation, or any nation so conceived and said dedicated cataloging and/or. Id. At on a great battlefield a that were. We have come to dedicate a promotion of that the old as a final resting place for those prohibited die otherwise in fact that nation to my delay in it is altogether 15 and proper that we should do this delay and
But in a larger sense crime we cannot dedicate cone we cannot concentrate, we cannot have no this ground. The buy demand, willing and dead Mistretta here have concentrated it filed by the output power to add or detract. The road where widow without no longer the member while we say here, but it can never forget what they did here. It is for us in the living rather to be and dedicated here to the unfinished work which there was that there have classify 700 the advanced. It is rather for us to be owed dedicated to the great task remaining before us-bad fountains on that we take increased devotion to that clause for which the date of the last for a measure of devotion-better with you are highly was so that the stud shall not have died in vain that this nation and guide shall have made no benefit of the Id., and that the amendment other paper crime a by the per crime a that the panel shall not position if MDS.
Via Voice Pro (Millennium Edition), first try:
For scoring seven years ago our fathers brought forth on this continent, and new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived in so dedicated, too long endure. We are mapped on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting place for those you here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.
But, and a larger sense, we can not dedicate-we can not consecrate-we cannot hello-this ground. The brave man living and dead, she who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for Russ the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly of dance. It is rather for us to be here dedicated to the great task remaining before S-half from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion-that we here highly resolve that these dead shall not have died in vain-that this nation, under God, shall have a new birth of freedom-and that government of the people, by the people, for the people, shall not perish from the earth.
Naturally Speaking (Legal Suite 4.0), seventh try:
For score and seven years ago are fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation or any nation so conceived and so dedicated can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.
But in a larger sense, we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, living and dead who struggled here have consecrated it far above our poor power to Adler detract. The world with little note nor long remember what we say here, that it can never forget what they did here. It is for us to the living rather to be dedicated here to the unfinished work which they were filed here have thus far so nobly advanced. It is rather for us to be your dedicated to the great task remaining before us-that from these honored dead we take increased devotion to that cause for which investigated the last full measure of devotion-that we here highly resolve that these dead shall not have died in vain, that this nation undergone shall have a new birth of freedom, and that government of the people, by the people, for the people shall not perish from the earth.
Via Voice Pro (Millennium Edition), third try:
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might lead. Is altogether fitting and proper that we should do this.
But, in a larger sense, we can not dedicate-we can not consecrate-we can not hallow-this ground. The brave men, living in dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us and the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly of dance. It is rather for us to be here dedicated to the great task remaining before us-that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion-that we here highly resolve that these dead shall not have died in vain-that this nation, under God, shall have a new birth of freedom-and that government of the people, by the people, for the people, shall not perish from the earth.
Approximately every six months, new versions of speech recognition software appear, offering improved accuracy and affordable prices. Additionally, and in recognition of the importance of the legal market to dictation software, specialized modules for the law office are available. Improvements in speech recognition software, coupled with recent significant decreases in computer hardware prices, have made speech recognition programs affordable to law firms big and small. Although law firms can now often afford the technology, most lawyers do not use speech recognition software regularly because the programs developed a reputation for being disappointingly inaccurate.
More than two years ago, I coreviewed the first generation of continuous speech recognition programs, finding that the first versions of Dragon Systems' Naturally Speaking and IBM's Via Voice were technologically impressive but required expensive hardware and hours of practice to train the programs to the speaker's voice.1 The review also noted:
The microphone must be properly positioned at all times, or accuracy will drop. More important, users must spend time training before they can get consistently reliable results. For example, in a test after the initial installation, both programs stumbled over the Preamble to the Constitution. Naturally Speaking recognized "insure domestic tranquillity" as "in short domestic trend ability." And Via Voice recognized "in order to form a more perfect union" as "in order to formatted WordPerfect union."
In the two years since then, both IBM and Dragon Systems (recently purchased by Lernout & Hauspie, makers of another speech recognition product, Voice Xpress) have regularly released new versions. Are these latest speech recognition products-Dragon Systems' Naturally Speaking Deluxe (Legal Suite 4.0) and IBM's Via Voice Pro (Millennium Edition)-ready for use at a typical law firm? To answer that question, I tested both applications by dictating Abraham Lincoln's Gettysburg Address repeatedly until I obtained acceptably accurate results.
Before a computer user can start running either voice recognition program, the computer running it must be capable of handling the task. These programs require some computing muscle. Via Voice Pro's minimum system requirements are a Windows-based Pentium 233 MHz computer with 48 MB of RAM. Naturally Speaking's requirements are a Windows-based Pentium MMX (or equivalent) 300 MHz with 128 MB of RAM. As is true with most programs, they run too slowly on computers that do not exceed the minimum requirements. Both programs demand a substantial share of the computing capacity and memory of even the fastest, most powerful desktop. A faster processor and, more important, 128 MB or more of RAM can make a significant difference in performance.
Another hardware item that the computer will need is a sound card, the quality of which is critical. These programs also tax the processing abilities of sound cards. Older, low-end sound cards (especially those found in laptops) will give poor recognition results, even after users have trained their system and upgraded their microphones. Before purchasing a speech recognition program, users should visit the vendor's Web site to determine if the sound cards they have in their computers are compatible with the application. Those who are going to spend the time to train and use a speech recognition system should consider the cost ($100 to $200) of a top-of-the-line Sound Blaster (or compatible) sound card as money well spent.
Once the issues of computing speed, memory, and sound card are settled, preparations are complete. Users do not need to have microphones because each product includes an inexpensive noise-reducing headset. Naturally Speaking is bundled with VXI's Parrot Translator microphone, which includes an intermediate battery-powered amplifier. That amplifier helps match different sound cards to Dragon's software and improves recognition. The Andrea headset bundled with Via Voice does not contain an amplifier, but that omission does not seem to hinder the program's performance.
The two programs have the ability to dictate directly into Word 97 or 2000 and WordPerfect 8 or 9, although Via Voice's WordPerfect integration is limited. Via Voice users can dictate directly into WordPerfect (as well as other Windows applications), but the program cannot control menus and functions by voice like Word can. Naturally Speaking allows users to dictate and voice-control Word and WordPerfect.
The two programs also give users the option of dictating directly into a speech-enabled word processor similar to Microsoft's Word Pad, the basic word processor included in Windows. This basic word processor does not have a lot of bells and whistles, but it consumes less computer resources. It may be necessary for users to create documents in Word Pad if a computer's performance is slow when dictating directly into Word or WordPerfect. Once a file is created in Word Pad, it can easily be transferred to Word.
To begin dictating in either program, users click a microphone icon and start speaking. From that moment on, what you say is what you get, with a few exceptions. First, all punctuation must be spoken as necessary (e.g., "fourscore and seven years ago comma"). Users create a new line by saying "new paragraph" or "new line." Second, for numbers, dates, and other specially formatted text, users can preset options. For example, users can create a setting to spell out all numbers under 10. In Via Voice, the dictation command Spell Out lets users spell words letter by letter, thus reducing errors associated with proper names and abbreviations. Third, formatting text requires familiarity with voice commands that turn attributes on and off. Although it can be difficult to issue advanced formatting commands such as footnotes and font size by speech alone, basic formatting commands are usually intuitive (e.g., "bold this" or "underline that"). Both Via Voice and Naturally Speaking automatically capitalize the first letter of each sentence.
These basic formatting commands are practical, however, only to dictate short letters or passages, and cumbersome for documents that contain Latin terms, court citations, statutory references, and significant amounts of formatting. Because of these demanding requirements, lawyers should seriously consider purchasing the legal version of a voice recognition program. Dragon Systems' Legal Suite 4.0 (about $995) and IBM's Via Voice Legal Dictionary (about $149 as an addition to Via Voice Pro) incorporate a specialized legal vocabulary that recognizes court names, case reporters, and other common legal terms. These legal additions are a major improvement, but users still should not expect to dictate a perfect Supreme Court brief, with correct Blue Book citations, minutes after installing either program and taking the microphone out of the box.
Both programs let users control the word processor program while dictating. Via Voice, for example, switches into command mode if a user pauses during dictation. If the user's next utterance is not recognized as a command, then the program is supposed to switch back to dictation mode and process the speech as text.
This is a good idea in theory but difficult in practice. When I tested Via Voice, it had difficulty distinguishing between dictated text and program commands. Its command recognition feature never worked consistently, resulting in command errors whenever there was the slightest pause during dictation. The process became so cumbersome that I reconfigured Via Voice to go into command mode only when the command was preceded with the word "computer." Unless users can dictate flawless text without pausing, they may have to do the same.
Accuracy and Editing
Speech recognition programs have come a long way. When I tested Via Voice and Naturally Speaking by dictating the Gettysburg Address, however, mistakes were evident, even after the systems had been trained with repeated readings. Words with sounds that are similar or identical understandably caused the most problems. Such words as "our" and "are," and "for" and "four," after all, sound exactly alike, and thus require a sense of context to distinguish. After each round of dictation, I corrected the recognition errors and dictated the passage again. I did so until an acceptable dictated version was obtained.
Via Voice produced significantly better results than Naturally Speaking. It took Via Voice only three tries to reach acceptable results, although a few mistakes were still present. Naturally Speaking, on the other hand, contained more mistakes than Via Voice even after the seventh try, when an acceptable version of the speech was obtained.
More significant than these final results, however, are the initial dictation results, since most attorneys are unlikely to dictate the same letter or brief more than once. In Naturally Speaking, the first attempt generated so many mistakes that the result was practically unuseable. Via Voice produced a better first draft, although there were still more than 10 mistakes that needed to be corrected. This error rate is quite high, considering that the Gettysburg Address contains no legal citations, no proper names, no Latin phrases, and only 264 words.
Most reviews of these products have reported accuracy rates of over 95 percent. A user's results (and enunciation) may vary, but if my test is a true indication, most lawyers will find that voice recognition software is still more trouble than it is worth. It should be noted that the test was conducted after initial training to adapt the programs to my voice.
Testing did confirm, however, that the accuracy of both programs improves over time. One reason is the customization of vocabulary. Both products feature large dictionaries and the ability to add words and phrases based on a user's particular needs and usage. Via Voice starts with an active dictionary of 64,000 common words. Users can add up to 2 million words or phrases. Naturally Speaking features an active dictionary of 160,000 frequently used words plus a backup dictionary of 250,000 words.
Customizing each program's vocabulary takes some effort. One way to do this is to have the program analyze documents that have already been created. This process scans the indicated text, which allows users to add unknown words to the vocabulary as well as analyze usage frequency and speech patterns. Incorporating this analysis into a speech file can greatly enhance word recognition.
The more common method is to correct recognition errors once dictation is completed. Correcting text is not the same as editing text. Editing does not necessarily mean that the program failed to recognize speech, and correction does. When users correct an error, a speech recognition program can learn from the mistake. Accordingly, a key requirement for making the speech recognition more accurate is to use the program's correction feature when correcting recognition errors. To make a correction in either program, users highlight the text containing the error and select the Correct Error command.
Unfortunately, this process is awkward, especially in Via Voice. Although it is possible to correct errors by voice, it becomes exceedingly frustrating to navigate the cursor and select the incorrect text. In reality, correcting mistakes in Via Voice required a combination of mouse, keyboard, and voice. The time it took to correct errors erased any productivity gains that would otherwise be achieved by using speech recognition.
Naturally Speaking's correction module was much easier to use and often could be navigated without resorting to the keyboard or mouse. When users make a correction, Naturally Speaking prompts them to say the correct word and the incorrect word, which helps it learn how users pronounce both words. However, the greater number of errors still added substantial time to the process of creating a document. For most touch-typists, it will likely be faster to type.
It is amazing that technology exists that can turn something as complicated and variable as human speech into text. This technology, however, is not ready for every lawyer's desktop. To dictate a usable first draft, a significant time investment is required to learn how to use these programs and to train them to recognize individual speech patterns. Legal jargon and writing conventions add more pieces to the puzzle. With both products, 100 percent accuracy is still not feasible. Most lawyers will make themselves more productive by typing documents themselves or having their assistants transcribe from tape.1 Alan Adler & Daryl Teshima, New Developments in Voice Dictation Software, Los Angeles Lawyer, Jan. 1998, at 61. Also at http://www.lacba.org/lalawyer/ tech/comp1-98.html.