« 上一頁繼續 »
SPEECH RECOGNITION CORRECTION FOR
DEVICES HAVING LIMITED OR NO
CROSS-REFERENCES TO RELATED 5
STATEMENT REGARDING FEDERALLY 10 SPONSORED RESEARCH OR DEVELOPMENT
BACKGROUND OF THE INVENTION 15
1. Field of the Invention
The present invention relates to speech recognition computer applications, and more specifically to an apparatus and method of correcting strings of text in a predominantly 2o speech-only environment such as dictating a message over a telephone.
2. Description of Related Art
Optimally, when an author prepares an electronic message for an intended recipient, the author enjoys all the conve- 25 niences inherent in using a standard QWERTY keyboard and visual monitor. Specifically, the keyboard facilitates efficient entry of the electronic message and the visual monitor provides visual feedback that enables the author of the electronic message to ensure that the electronic message is 30 properly recorded before it is transmitted. Oftentimes however, the author's effective use of either the keyboard or monitor may be inhibited. For example, in the case of a vehicle-based computer, the author's hands and eyes may be occupied while driving the vehicle and thus, a standard 35 QWERTY keyboard may not be available.
Similarly, a QWERTY keyboard may not be available in the use of a "wearable computer". A wearable computer comprises a battery-powered computer system that is worn on a speaker's body, for instance on the speaker's belt, 40 backpack, vest, and the like. Wearable computers are designed for mobile and predominantly hands-free computer operations. Wearable computers typically incorporate a head-mounted display and have means for accepting and processing speech input. However, wearable computers 45 typically do not include a fully operational QWERTY keyboard.
Finally, a traditional alphanumeric keyboard may not be available in the use of a cellular phone, pager, personal digital assistant, or other portable computing device. Spe- 50 cifically, an author may desire to compose an electronic message using a portable computing device even though a QWERTY keyboard may not be included therewith. An example of such circumstance can include creating a pager message for an intended recipient or reciting information for 55 use on a standardized form such as a shipping label or a business-to-business purchase order.
Notwithstanding, modern speech recognition applications can utilize a computer to convert acoustic signals received by a microphone into a workable set of data without the 60 benefit of a QWERTY keyboard. Subsequently, the set of data can be used in a wide variety of other computer programs, including document preparation, data entry, command and control, messaging, and other program applications as well. Thus, speech recognition is a technology 65 well-suited for use in devices not having the benefit of keyboard input and monitor feedback.
Still, effective speech recognition can be a difficult problem, even in traditional computing, due to a wide variety of pronunciations, individual accents, and the various speech characteristics of multiple speakers. Ambient noise also frequently complicates the speech recognition process, as the computer may try to recognize and interpret the background noise as speech. Hence, often, speech recognition systems can misrecognize speech input compelling the speaker to perform a correction of the misrecognized speech.
Typically, in traditional computers, for example a desktop PC, the correction of misrecognized speech can be performed with the assistance of both a visual display and a keyboard. However, correction of misrecognized speech in a device having limited or no display can prove complicated if not unworkable. Consequently, a need exists for a correction method for speech recognition applications operating in devices having limited or no display. Such a system could have particular utility in the context of a speech recognition system used to dictate e-mail, telephonic, and other messages on devices having only a limited or no display channel.
BRIEF SUMMARY OF THE INVENTION
A method and apparatus for speech recognition correction is provided for devices having a limited or no display channel. The method is preferably implemented by a machine readable storage mechanism having stored thereon a computer program, the method comprising the following steps. First, audio speech input can be received and speechto-text converted to speech recognized text. Second, a first speech correction command for performing a correction operation on speech recognized text stored in a text buffer can be detected in the speech recognized text. Third, if a speech correction command is not detected in the speech recognized text, the speech recognized text can be added to the text buffer. Fourth, if a speech command is detected in the speech recognized text, the detected correction speech command can be performed on speech recognized text stored in the text buffer.
Notably, the receiving step can further comprise the step of audibly confirming the speech-to-text conversion of the speech recognized text. The step of audibly confirming the speech-to-text conversion of the speech recognized text can comprise audibly playing back the recorded speech recognized text so that it can be determined if the recorded speech recognized text had been misrecognized in the converting step.
The first speech correction command can indicate a preference to terminate the speech correction method. Responsive to detecting this type of first speech correction command in the speech recognized text, it can be determined if the speech recognized text stored in the text buffer had been spelled out. If the speech recognized text stored in the text buffer had been spelled out, the speech recognized text can be added to a speech recognition vocabulary of speech recognizable words. Subsequently, the speech correction method can be terminated.
The first speech correction command can further indicate a preference to correct misrecognized text in the text buffer. Responsive to detecting this type of first speech correction command in the speech recognized text, a list of speech correction candidates can be audibly played back, wherein each speech correction candidate in the list is statistically alternative recognized text to the audio speech input. Subsequently, a selection of one of the speech correction candidates in the list can be received; and, the misrecognized text in the text buffer can be replaced with the selected speech correction candidate.
Instead of receiving a selection, a second speech correction command can be received indicating both preferred 5 replacement text and a preference to replace the misrecognized text with the preferred replacement text in the text buffer. Responsive to receiving such second speech correction command, the misrecognized text in the text buffer can be replaced with the preferred replacement text. Addition- 10 ally, the second speech correction command can indicate a preference to replace the misrecognized text in the text buffer with spelled-out replacement text. Responsive to receiving such second speech correction command, audibly spelled-out replacement text can be accepted, the audibly 15 spelled-out replacement text comprising a series of spoken alphanumeric characters. The series of spoken alphanumeric characters can be speech-to-text converted and each speechto-text converted alphanumeric character stored in a temporary buffer. The speech-to-text converted alphanumeric char- 20 acters can be combined into spelled out replacement text and the misrecognized text in the text buffer can be replaced with the spelled out replacement text. In the preferred embodiment, prior to accepting audibly spelled out replacement text, a pre-stored set of instructions for providing the spelled 25 out replacement text can be audibly played.
Notably, a third speech correction command can be detected in the audibly spelled-out replacement text. The third speech correction command can indicate a preference to delete a particular alphanumeric character stored in the 30 temporary buffer. Responsive to detecting such third speech correction command, the particular alphanumeric character can be deleted from the temporary buffer. Additionally, the third speech correction command can indicate both a preferred replacement alphanumeric character and a preference 35 to replace a particular alphanumeric character with the preferred replacement alphanumeric character in the temporary buffer. Responsive to detecting such third speech correction command, the particular alphanumeric character can be replaced with the preferred alphanumeric character in the 40 temporary buffer.
The foregoing and other objects, advantages, and aspects of the present invention will become apparent from the following description. In the description, reference is made to the accompanying drawings which form a part hereof, and 45 in which there is shown, by way of illustration, a preferred embodiment of the present invention. Such embodiment does not necessarily represent the full scope of the invention, however, and reference must also be made to the claims herein for properly interpreting the scope of this invention. 50
BRIEF DESCRIPTION OF THE SEVERAL
VIEWS OF THE DRAWINGS
FIG. 1 illustrates a computer apparatus by which the 55 method of the present invention may be practiced;
FIG. 2 is a block diagram showing a typical high-level computer architecture for use with the computer apparatus of FIG. 1;
FIG. 3 is a flowchart illustrating a method for dictating a 60 body of text according to the present invention;
FIG. 4 is a flowchart illustrating a method for implementing the Stop command of FIG. 3;
FIG. 5 is a flowchart illustrating a method for implementing the Correct command of FIG. 3; and 65
FIG. 6 is a flowchart illustrating a method for implementing the Spell command of FIG. 5.
DETAILED DESCRIPTION OF THE
The present invention is an apparatus and method of correcting misrecognized speech in a speech recognition application operating in a computer device having limited or no display. To compensate for the limited keyboard input and display output capabilities of the computer device, the method of the invention can provide audio feedback to a speaker to facilitate the speaker's identification of misrecognition errors. Additionally, the method of the invention can provide speech command and control functionality for correcting misrecognitions. Such functionality can include "Delete" and "Replace" speech commands. Moreover, such functionality can include a "Spell Word"function for providing to the speech recognition application an exact spelling of a misrecognized word.
FIG. 1 illustrates a computer device 10 having limited or no display by which the method of the present invention may be practiced. The computer device 10 can be embedded in a vehicle for instance the computer device can be incorporated in a vehicle navigation system. Alternatively, the computer device 10 can be included as part of a portable computing device or wearable computer. Finally, the computer device 10 can be included in a telephony system. Still, the invention is not limited in regard to the form or use of the computer device 10. Rather, the spirit and scope of the invention includes all computer devices having a limited or no display and computers devices whose use results in a limited or no display.
The computer device 10 preferably includes a central processing unit (CPU) 12, an internal memory device 14 such as a random access memory (RAM), and a fixed storage media 16 such as flash memory or a hard disk drive. The fixed storage media 16 stores therein an Operating System 18 and a Speech Recognition Application 20 by which the method of the present invention can be practiced. Computer audio circuitry (CAC) 28 is also preferred and can be included in the computer device 10 so as to provide an audio processing capability to the computer device 10. As such, audio input means 6, for example a microphone, and audio output means, for example a speaker 8, can be provided both to receive audio input signals for processing in the computer audio circuitry 28 and to provide audio output signals processed by the computer audio circuitry 28. Notably, where the computer device 10 is included as part of a telephony system, the audio input means 6 and audio output means 8 can be included in a telephone handset used by a speaker to communicate with the telephony system.
Optionally, the computer device 10 can additionally include a keyboard (not shown) and at least one speaker interface display unit such as a VDT (not shown) operatively connected thereto for the purpose of interacting with the computer device 10. However, the invention is not limited in this regard and the computer 10 requires neither a keyboard or a VDT in order to suitably operate according to the inventive arrangements. In fact, the method of the invention is intended to provide a speech correction capability to devices having limited or no display and no keyboard. Hence, in the preferred embodiment, the computer device 10 does not include either a keyboard or VDT.
FIG. 2 illustrates a preferred architecture for the computer device 10 of FIG. 1. As shown in both FIGS. 1 and 2, the Operating System 18 can be stored in fixed storage 16. The Operating System 18 is preferably an embedded operating system, for example QNX Neutrino® or Wind River System's VxWorks®. The operating system 18 is not limited in