US20010016815A1 - Voice recognition apparatus and recording medium having voice recognition program recorded therein - Google Patents

Voice recognition apparatus and recording medium having voice recognition program recorded therein Download PDF

Info

Publication number
US20010016815A1
US20010016815A1 US09/088,996 US8899698A US2001016815A1 US 20010016815 A1 US20010016815 A1 US 20010016815A1 US 8899698 A US8899698 A US 8899698A US 2001016815 A1 US2001016815 A1 US 2001016815A1
Authority
US
United States
Prior art keywords
voice
voice data
data
recording medium
voice recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/088,996
Other versions
US6353809B2 (en
Inventor
Hidetaka Takahashi
Takafumi Onishi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Olympus Corp
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP14972997A external-priority patent/JP3905181B2/en
Priority claimed from JP10011632A external-priority patent/JPH11212595A/en
Priority claimed from JP10011631A external-priority patent/JPH11212590A/en
Assigned to OLYMPUS OPTICAL CO., LTD. reassignment OLYMPUS OPTICAL CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ONISHI, TAKAFUMI, TAKAHASHI, HIDETAKE
Application filed by Individual filed Critical Individual
Assigned to OLYMPUS OPTICAL CO., LTD. reassignment OLYMPUS OPTICAL CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ONISHI, TAKAFUMI, TAKAHASHI, HIDETAKA
Publication of US20010016815A1 publication Critical patent/US20010016815A1/en
Publication of US6353809B2 publication Critical patent/US6353809B2/en
Application granted granted Critical
Assigned to OLYMPUS CORPORATION reassignment OLYMPUS CORPORATION CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: OLYMPUS OPTICAL CO., LTD
Assigned to OLYMPUS CORPORATION reassignment OLYMPUS CORPORATION CHANGE OF ADDRESS Assignors: OLYMPUS CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present invention relates to a voice recognition apparatus and a recording medium having a voice recognition program recorded therein. More particularly, this invention is concerned with a voice recognition apparatus for recognizing voice data, and a recording medium in which a voice recognition program causing a computer to recognize voice data is recorded.
  • An example of a software package enabling voice recognition is a product “Voice Type 3.0 for Windows 95” released recently by IBM Ltd. This product converts voice input through a microphone into text data in real time and enjoys a considerably high recognition ratio.
  • the application software permits real-time input through a microphone that is only one means for inputting voice data.
  • An already existent voice file cannot be recognized directly.
  • One object of development of the aforesaid voice recognition technology is to realize a so-called voice word processor or a dictation system for automatically creating a document on the basis of voice data input by performing dictation, and displaying the document in a screen or the like.
  • a conventionally adopted means is such that when the contents of a document to be created are dictated and temporarily recorded by a recording apparatus such as a tape recorder, and a secretary, typist, or the like reproduces the dictated contents and documents them using a documentation apparatus such as a type writer, word processor, or the like.
  • a recording apparatus such as a tape recorder
  • a secretary, typist, or the like reproduces the dictated contents and documents them using a documentation apparatus such as a type writer, word processor, or the like.
  • This style has been generally adopted as one form of effective utilization of the recording apparatus such as a tape recorder.
  • a word irrelevant to contents to be informed may be contained.
  • an incorrectly uttered word or a word having no meaning such as “Ah” or “Well” (hereinafter an unnecessary word) may be contained (frequently in some cases).
  • a voice recognition apparatus comprising: a standard pattern memory means for storing standard patterns; an unnecessary word pattern memory means for storing patterns of unnecessary words; a word spotting means for spotting as a word or word-spotting a standard pattern stored in the standard pattern memory means or a pattern of an unnecessary word stored in the unnecessary word pattern memory means on the basis of input voice, and outputting a corresponding interval and score; a producing means for hypothesizing the contents of uttered voice and producing a representation of the meaning; and an analyzing means for analyzing the result of word-spotting, which is performed by the word spotting means, on the basis of the representation of the meaning of the hypothesis produced by the producing means.
  • the analyzing means allocates a score resulting from word-spotting performed on the pattern of an unnecessary word to remaining intervals, of which corresponding standard patterns or patterns of an unnecessary word have not been word-spotted, among all the intervals of data items constituting the voice.
  • the result of word-spotting performed by the word spotting means is then analyzed.
  • the voice recognition apparatus described in the Japanese Unexamined Patent Publication No. 7-5893 has difficulty in carrying out practical processing within an existing computer (especially a computer of a personal level) because the data size of language models becomes enormous.
  • a sound-level meter for indicating a sound level of voice is displayed in, for example, a screen or the like so that a speaker himself/herself can manage his/her sound level of voice properly.
  • a sound pressure level display for a voice recognition apparatus comprising a first sound receiver for receiving a voice signal, a second sound receiver for receiving a noise whose level is close to that of the voice signal received by the first sound receiver, a sound pressure level ratio calculating means for calculating a ratio of a sound pressure level of a voice signal input to the first sound receiver to a ratio of a sound pressure level of a noise input to the second sound receiver, and a display means for displaying the ratio of sound pressure levels calculated by the sound pressure level ratio calculating means is described in Japanese Unexamined Patent Publication No. 5-231922.
  • the first object of the present invention is to provide a voice recognition apparatus for recognizing voice represented by voice data recorded in a given recording medium and a recording medium in which a voice recognition program is recorded.
  • the second object of the present invention is to provide a voice recognition apparatus capable of treating an unnecessary word or the like contained in voice without the need of especially fast processing, and a recording medium in which a voice recognition program is recorded.
  • the third object of the present invention is to provide a voice recognition apparatus capable of recognizing voice on a stable basis irrespective of a sound level indicated by recorded voice data, and a recording medium in which a voice recognition program is recorded.
  • a recording medium in accordance with the present invention having a voice recognition program recorded therein is used to run the voice recognition program in a computer, whereby the voice recognition program causes the computer to read voice data from a voice data recording medium in which the voice data is recorded, recognize voice represented by the voice data so as to convert the voice into text data, and display the text data.
  • FIG. 1 is a block diagram schematically showing the configuration of a computer that is the first embodiment of a voice recognition apparatus in accordance with the present invention
  • FIG. 2 is a flowchart describing the first example (first voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment;
  • FIG. 3 is a diagram showing an example of display appearing when voice recognition application software read from the first recording medium is activated in the computer of the first embodiment, or a main screen used to reproduce compressed voice data;
  • FIG. 4 is a diagram showing an example of a screen in which text data is displayed when the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment;
  • FIG. 5 is a diagram showing an example of a dialog box screen used to set a time interval between voice recognitions and the number of displayed words when a given number of words are recognized at intervals of a given time since the start of a file subjected to voice recognition, after the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment;
  • FIG. 6 is a diagram showing an example of a screen in which a given number of words recognized at intervals of a given time since the start of a file subjected to voice recognition after the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment is displayed;
  • FIG. 7 is a flowchart describing a second example (second voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment;
  • FIG. 8 is a flowchart describing a third example (third voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment;
  • FIG. 9 is a diagram showing an example of a dialog box screen used to set a word to be retrieved for voice recognition when only a word that must be recognized in voice and contained in a voice compressed file is recognized in voice after the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment;
  • FIG. 10 is a flowchart describing a fourth example (fourth voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment;
  • FIG. 11 is a flowchart describing a fifth example (fifth voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment;
  • FIG. 12 is a conceptual diagram showing the overall configuration of a dictation system of the second embodiment of the present invention.
  • FIG. 13 is a block diagram showing the electrical configuration of a digital recorder of the second embodiment
  • FIG. 14 is a diagram showing a scene in which a check mark button of the digital recorder is handled during dictation in the second embodiment
  • FIG. 15 is a diagram showing the format of data to be recorded in a voice memory of a miniature card by means of the digital recorder of the second embodiment
  • FIG. 16 is a block diagram showing the electrical configuration of a personal computer of the second embodiment
  • FIG. 17 is a flowchart describing voice recognition carried out in the personal computer of the second embodiment
  • FIG. 18 is a diagram showing an overall flow of reading voice data from a voice memory and recognizing voice which is followed by the dictation system of the third embodiment of the present invention.
  • FIG. 19 is a flowchart describing voice recognition carried out by a dictation system of the third embodiment of the present invention.
  • FIG. 20 is a flowchart describing the contents of processing relevant to judgment of voice or voiceless which is briefed in FIG. 19;
  • FIG. 21 is a flowchart describing the contents of gain calculation briefed in FIG. 19.
  • FIG. 1 is a block diagram schematically showing the configuration of a computer that is the first embodiment of a voice recognition apparatus in accordance with the present invention.
  • a computer 1 consists, as shown in FIG. 1, mainly of: a central processing unit (CPU) 1 a responsible for control of the whole computer 1 ; a first input unit 5 in which an external recording medium (first recording medium 7 ) having a given program recorded therein can be mounted freely; a first recording medium driver 6 , incorporated in the first input unit 5 , for reading a given program from the first recording medium 7 under the control of the CPU la when the first recording medium 7 is mounted in the first input unit 5 ; a second input/output unit 8 in which an external recording medium (second recording medium 10 ) having given voice data recorded therein can be mounted freely; a second recording medium driver 9 , incorporated in the second input/output unit 8 , for reading given voice data and writing given data from and in the second recording medium 10 under the control of the CPU 1 a when the second recording medium 10 is mounted in the second input/output unit 8 ; an operation unit 2 for inputting a given instruction entered by a user; a display unit 3 serving as a
  • the computer 1 is configured to permit operation of an operation system (OS) capable of executing a plurality of application software concurrently (multitasking).
  • OS operation system
  • multitasking multitasking
  • the first recording medium 7 is a recording medium in which a given voice recognition program is recorded.
  • a portable recording medium such as a CD-ROM or floppy disk is imagined as the recording medium.
  • the second recording medium 10 is a voice data recording medium in which given voice data is recorded.
  • the second recording medium 10 will be described below.
  • the second recording medium 10 is a recording medium in which voice data acquired by an external solid-state recorder is recorded.
  • a card-shaped recording medium that is a flash memory is imagined.
  • flash memory In recent years, there has been an increasing demand for a flash memory. Digital solid-state recorders using the flash memory as a recording medium have been commercialized.
  • the flash memory is known in many types of card-shaped recording media. For example, a memory card conformable to the PCMCIA standard, a miniature card manufactured by Intel Corp., an SSDFC manufactured by Toshiba Co., Ltd., and a compact flash memory manufactured by SunDisk Co., Ltd. are known.
  • these card-shaped flash memories are connected to a personal computer via an adaptor or the like, and capable of transferring given data.
  • Many of the existing card-shaped memories have a storage capacity ranging from 2M bytes to 8M bytes.
  • the digital solid-state recorders currently on the market include those capable of recording sound in a card having a storage capacity of 2M bytes for 20 min. to 40 min.
  • the solid-state recorders convert an analog signal input through a microphone into digital PCM data, which is digital data modulated in pulse code, or the like, compresses the PCM data according to an algorithm for encoding based on the ADPCM or CELP, and records compressed data in a flash memory card.
  • the thus recorded data can be read directly by a personal computer via an adaptor.
  • the computer 1 of this embodiment reads voice data from the flash memory card (second recording medium 10 ) mounted as mentioned above.
  • a user mounts a recording medium (first recording medium 7 ), in which a given voice recognition program is recorded, in the first input unit 5 of the computer 1 .
  • the computer 1 reads a given voice recognition program, which is application software, from the connected first recording medium 7 into an internal memory, which is not shown, via the first recording medium driver 6 . This causes the CPU 1 a to control a voice recognition operation following the program.
  • FIG. 2 is a flowchart describing the first example (first voice recognition program) of a voice recognition program recorded in the recording medium in accordance with the present invention having the voice recognition program recorded therein.
  • the CPU 1 a reads voice data from a voice compressed file containing voice data compressed and recorded by an external solid-state recorder (step S 1 ).
  • the first voice recognition program stretches compressed voice data into PCM data by reversely following a compression algorithm according to which data is recorded by the solid-state recorder (step S 2 ). In other words, this processing that is identical to reproduction performed by the solid-state recorder is carried out by the computer 1 controlled by the first voice recognition program.
  • the PCM data stretched at step S 2 is subjected to voice recognition (step S 3 ).
  • the voice-recognized data or data recognized in voice is converted into text data (step S 4 ), and the converted text data is displayed on a display (display unit 3 ) (step S 5 ). This processing is continued until the voice-recognized data comes to an end (step S 6 ).
  • FIG. 3 shows an example of display appearing when the voice recognition application software read from the first recording medium 7 is activated in the computer 1 of this embodiment, or a main screen used to reproduce voice data that is compressed data representing voice.
  • FIG. 3 shows a main screen 11 in which: a menu bar 12 used to select file-related handling or editing-related handling; a tool button bar 13 presenting easily discernibly various kinds of handling in the form of icons; a voice file list box 14 which displays a list of information such as names of voice files transferred from the second recording medium 10 , recording times, dates of recording, and priorities and in which a voice file whose data is reproduced or voice-recognized is highlighted in contrast with the other voice files; and a reproduction control 18 used to carry out processing such as replay, stop, fast feed, or fast return are displayed.
  • a menu bar 12 used to select file-related handling or editing-related handling
  • a tool button bar 13 presenting easily discernibly various kinds of handling in the form of icons
  • a voice file list box 14 which displays a list of information such as names of voice files transferred from the second recording medium 10 , recording times, dates of recording, and priorities and in which a voice file whose data is reproduced or voice-recognized is highlighted in contrast with the
  • the tool button bar 13 is provided with a voice recognition tool button group 21 consisting of a voice recognition start button 22 , word recognition button 23 , and list display button 24 .
  • the reproduction control 18 is provided with a current position-of-reproduction indicator slider 15 , lines 16 , and an index search button 17 .
  • the list display button 24 belonging to the voice recognition tool button group 21 is a button used to recognize a certain number of words at intervals of a certain time since the start of a file subjected to voice recognition, and display the words in the form of a list.
  • a dialog box shown in FIG. 5 appears.
  • a user is prompted to enter the setting of a time in sec, at intervals of which words will be recognized, since the start of a file (file subjected to voice recognition) highlighted in the voice file list box 14 , and the setting of the number of words to be recognized and displayed. If the user wants to suspend the processing, he/she presses a cancel button shown in FIG. 5. Thus, control can be returned to the main screen shown in FIG. 3.
  • FIG. 7 is a flowchart describing the second example (second voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein.
  • a processing operation of recognizing a given number of words at intervals of a certain time since the start of a file subjected to voice recognition, and displaying the words in the form of a list is described.
  • step S 11 voice data is first read from a file subjected to voice recognition and recorded in the second recording medium 10 .
  • the second voice recognition program stretches the compressed voice data in the same manner as the first voice recognition program (step S 12 ). If a word coincident with a time instant when the set time has elapsed is detected (step S 13 ), stretched PCM data starting with the word is voice-recognized (step S 14 ).
  • the voice-recognized data is converted into text data (step S 15 ), and the converted text data is, as shown in FIG. 6, displayed by the given number of words on the display (display unit 3 ). Specifically, in the list box shown in FIG. 6, display of a position-of-reproduction time passed since the start of the voice-recognized file and display of text data starting at the position of reproduction are carried out sequentially by the number of words set in the dialog box shown in FIG. 5. This processing is terminated when data comes to an end (step S 17 ).
  • FIG. 8 is a flowchart describing the third example (third voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, wherein a processing operation of starting voice recognition at a given position in a file subjected to voice recognition and displaying the result is described.
  • step S 21 voice data is read from a file subjected to voice recognition in the second recording medium.
  • the third voice recognition program stretches compressed voice data in the same manner as the first voice recognition program (step S 22 ). If a word coincident with a given position is detected (step S 23 ), stretched PCM data starting with the word at the given position is voice-recognized (step S 24 ).
  • the voice-recognized data is converted into text data (step S 25 ), and the converted text data is displayed on the display (display unit 3 ) (step S 26 ). In other words, text data starting at the given position set in the editor screen shown in FIG. 4 is displayed. This processing is terminated when data comes to an end.
  • the word recognition button 23 belonging to the voice recognition tool button group 21 shown in FIG. 3 is a button for use in voice-recognizing a desired word, which should be voice-recognized, among those contained in a file subjected to voice recognition, and indicating the positions of the desired word. Specifically, when the word recognition button 23 is pressed, only the word that should be voice-recognized is retrieved from a voice-compressed file by carrying out voice recognition. Retrieved locations are indicated with the lines 16 in the current position-of-reproduction indicator slider 15 so that they can be discerned at sight. The details will be described below.
  • the dialog box shown in FIG. 9 appears. With the dialog box, a user is prompted to enter a specified word that should be recognized. For suspending this processing, the cancel button is pressed. The processing is then exited and the main screen shown in FIG. 3 is returned.
  • FIG. 10 is a flowchart describing the fourth example (fourth voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, wherein a processing operation of voice-recognizing desired words alone, which should be voice-recognized, among those contained in a file subjected to voice recognition, and indicating the positions of the desired words is described.
  • step S 31 voice data is read from a file subjected to voice recognition in the second recording medium.
  • the fourth voice recognition program stretches compressed voice data in the same manner as the first voice recognition program (step S 32 ). Voice recognition is then started at the start of the selected voice-compressed file (step S 33 ).
  • step S 34 when the word registered in the dialog box shown in FIG. 9 is recognized from among those contained in the file subjected to voice recognition (step S 34 ), the positions of the word are indicated with the lines 16 in the current position-of-reproduction indicator slider 15 in the main screen 12 shown in FIG. 3. An index mark is inserted into a voice data item coincident with the position. Every time the index search button 17 in the reproduction control 18 in the main screen 11 shown in FIG. 3 is pressed, control is skipped sequentially to one of the positions indicated with the lines 16 (step S 35 and step S 36 ). This facility can be validated not only when reproduction is stopped but also when reproduction is under way.
  • This processing is terminated when data comes to an end (step S 37 ).
  • FIG. 11 is a flowchart describing the fifth example (fifth voice recognition program) of a voice recognition program recording in a recording medium in accordance with the present invention having the voice recognition program recorded therein, wherein a processing operation of deleting a portion of voice data corresponding to a designated portion of text data from the second recording medium 10 is described.
  • voice data is read from a file subjected to voice recognition in the second recording medium 10 (step S 41 ).
  • the fifth voice recognition program stretches compressed voice data in the same manner as the first voice recognition program (step S 42 ).
  • the stretched PCM data is voice-recognized (step S 43 ).
  • the voice-recognized data is converted into text data (step S 44 ). Addresses in the second recording medium 10 associated with words are detected and then listed (step S 45 ). Table 1 indicates the addresses in the second recording medium 10 allocated to an example of text data “The sky is blue and the ocean is also blue.” TABLE 1 Leading and last addresses in a Text Word recording medium 1 the 3468H 3492H 2 sky 3494H 3560H 3 is 3580H 3600H 4 blue 3610H 3620H 5 and 3622H 3640H 6 the 3692H 3699H 7 ocean 3706H 3720H 8 is 3724H 3736H 9 also 3740H 3753H 10 blue 3760H 3770H
  • step S 46 and step S 47 Thereafter, the above text data is kept displayed on the display until the data comes to an end (step S 46 and step S 47 ).
  • step S 48 When data comes to an end, it is judged whether or not the text data should be deleted (step S 48 ). If the data should be deleted, a position of deletion is designated in the text data (step S 49 ). Addresses in the second recording medium 10 associated with the designated position are retrieved from Table 1 (step S 50 ).
  • voice data is read from the second recording medium 10 (step S 51 ), and stretched (step S 52 ).
  • the portion of the voice data defined by the addresses is deleted (step S 53 ).
  • the voice data is compressed again (step S 54 ) and then overwritten (step S 55 ).
  • addresses are listed so that a position of deletion in text data can be associated with a position in the second recording medium.
  • the present invention is not limited to this mode. For example, times passed since the start of a file may be recorded in the form of a list.
  • a CPU According to the voice recognition program of the first embodiment recorded in a recording medium to be adapted to a computer, a CPU is conventionally requested to exhibit a great processing capability because when voice output through a microphone is recognized directly, voice recognition must be carried out in real time. However, since stretching of a voice-compressed file and voice recognition should merely be repeated, the advantage that real-time processing is not required and the CPU is not requested to exhibit a great processing capability is exerted.
  • control can be skipped spontaneously from an existing voice-compressed file to the position of a word serving as a keyword. A position of the word that should be retrieved can be reached at once.
  • the first recording medium 7 is an external recording medium. After a recording medium having a given voice recognition program recorded therein is mounted in the computer 1 , the given voice recognition program that is application software can be read from the recording medium.
  • the present invention is not limited to this mode. Alternatively, any mode will do as long as a given voice recognition program can be activated by working on the CPU 1 a in the computer.
  • the computer 1 may be provided with a recording medium having a voice recognition program recorded therein in advance so that the voice recognition program can be read any time.
  • FIGS. 12 to 17 relate to the second embodiment of the present invention.
  • FIG. 12 is a conceptual diagram showing the overall configuration of a dictation system to which the present invention is adapted.
  • the dictation system comprises: as shown in FIG. 12, a digital recorder 26 that is a voice recording apparatus for converting voice into an electric signal and producing voice data; a miniature card 10 A, freely detachably attached to the digital recorder 26 , serving as a voice date recording medium in which voice data is recorded; a PC card adaptor 27 used to insert the miniature card 10 A into a PC card slot 9 A (See FIG. 12
  • a personal computer 1 A including a display 3 A serving as a display means, and a keyboard 2 A and mouse 2 B serving as an operation unit, and acting as a voice recognition apparatus for processing voice data read from the miniature card 10 A through the PC card slot 9 A according to a control program 28 or a voice recognition program 29 .
  • FIG. 13 is a block diagram showing the electrical configuration of the digital recorder 26 .
  • the digital recorder 26 comprises: as shown in FIG. 13, a microphone 31 serving as a voice data input means for inputting voice and converting it into an electric signal; a microphone amplifier 32 for amplifying a voice signal sent from the microphone 31 to a proper level; a lowpass filter 33 for removing unnecessary high-frequency components from the voice signal amplified by the microphone amplifier 32 ; an A/D converter 34 for converting an analog voice signal output from the lowpass filter 33 into digital data; an encoder-decoder 35 for encoding (compressing) the digitized voice signal during an recording operation, and decoding (stretching) encoded data during a reproduction operation; a memory control unit 36 serving as a recording means for controlling recording or reproduction of voice information in or from a voice memory 37 , which will be described later, on the basis of address information given by a system control unit 38 to be described later; a voice memory 37 incorporated in the miniature card 10 A serving as a voice data recording medium and formed with, for example, a semiconductor memory; a miniature
  • a system control unit 38 that controls the digital recorder 26 including the encoder-decoder 35 , memory control unit 36 , and voice memory 37 in a centralized manner and that serves as a recording means to which an output terminal of the operation input unit 43 is connected.
  • FIG. 14 is a diagram showing a scene in which the check mark button of the digital recorder is handled during dictation.
  • the check mark button 43 a serving as an interval designating means of the operation input unit 43 is, as shown in FIG. 14, located at a position enabling the thumb of a hand, by which the digital recorder 26 is grabbed, to handle the check mark button easily.
  • the check mark button is a button to be pressed in order to append a check mark, which indicates that an uttered word is an unnecessary word, to voice data when an unnecessary word or the like is uttered while the contents of a document to be created are being dictated.
  • the unnecessary word or the like is uttered unconsciously. The instant an unnecessary word was uttered, a speaker can recognize the uttered word as an unnecessary word. Since the check mark button 43 a is located at a position enabling the speaker to press it easily, a check mark can be appended readily if necessary.
  • FIG. 15 is a diagram showing the format of data to be recorded in the voice memory 37 in the miniature card 10 A by the digital recorder 26 .
  • One record data is managed in the form of a file.
  • information for example, a date of recording and a recording time is written as a file header.
  • data divided into frames is written.
  • each frame includes check mark information indicating whether or not the check mark button 43 a has been pressed, and encoded voice data.
  • the check mark information is structured as, for example, a flag of, for example, 1 bit long. When the check mark button 43 a is pressed, the flag is set to “1.” When the check mark button 53 a is not pressed, the flag is set to “0.”
  • FIG. 16 is a block diagram showing the electrical configuration of the personal computer 1 A.
  • the personal computer 1 A carries out voice reproduction, information display, and the like according to the control program 28 , carries out documentation according to the voice recognition program 29 , and also carries out various kinds of processing according to the other various kinds of programs.
  • the personal computer 1 A comprises: a CPU 51 serving as a detecting means, a level adjusting means, a voice recognizing means, a voice rating means, a minimum value calculating means, a gain value calculating means, a multiplying means, and an averaging means; a main memory 52 serving as a recording medium offering a work area for the CPU 51 ; an internal recording medium 53 serving as a recording medium which is formed with, for example, a hard disk or floppy disk and in which the control program 28 and voice recognition program 29 are recorded; an external port 54 used to connect the personal computer to various kinds of external equipment; an interface 55 used to connect the display 3 A to the personal computer; an interface 56 used to connect the keyboard 2 A or mouse 2 B; a loudspeaker 4 A that is a voice output unit
  • Voice data may be read directly from the miniature card 10 A via the PC card slot 9 A.
  • the voice data may be temporarily recorded in the internal recording medium 53 and read from the internal recording medium 53 .
  • the voice data may be read directly from the digital recorder 26 via a communication means or the like.
  • the voice data reading means is not limited to the PC card slot.
  • FIG. 17 is a flowchart describing processing of voice recognition carried out in the personal computer 1 A.
  • the voice recognition is, as mentioned later, carried out stepwise in the order of phoneme recognition, word recognition, and sentence recognition.
  • step S 61 when the voice recognition start button 22 belonging to the voice recognition tool button group 21 in the tool button bar 13 in the main screen 11 is clicked, voice recognition is started.
  • a voice file highlighted in the voice file list box 14 is read in units of a given frame (step S 61 ), and decoded in units of the frame (step S 62 ).
  • the decoded voice data is passed to the voice recognition program 29 .
  • a phoneme is identified (step S 63 ).
  • Word recognition is then carried out, wherein a word stream that matches input voice most satisfactorily is retrieved on the basis of a given language model suggested by the identified phoneme (step S 64 ).
  • the language model is a model giving a probability of occurrence that suggests a given word stream.
  • various forms have been conceived.
  • an efficient model taking account of unnecessary words or the like has not been devised yet.
  • check mark information located at the start of each frame shown in FIG. 15 is checked to see if a word represented by data in a frame immediately preceding the frame is an unnecessary word or the like.
  • step S 65 it is judged whether or not the check mark information is 1 (step S 65 ). If the check mark information is 1, a word represented by data in a frame immediately preceding the frame is not regarded as an object of processing of sentence recognition of the next step (step S 66 ). If the check mark information is 0, sentence recognition is carried out (step S 67 ).
  • step S 68 Character conversion for converting voice data into character codes on the basis of a recognized sentence.
  • the result of recognition is displayed in a screen on the display 3 A (step S 69 ).
  • step S 70 it is judged whether or not the voice file has come to an end. If the voice file has not come to an end, control is returned to step S 61 . If the voice file has come to an end, the processing is terminated.
  • control program 28 causes the personal computer 1 A to fetch voice data from the miniature card 10 A, and to detect check mark information appended to the voice data. If the check mark information is 1, the voice data is not passed to the voice recognition program 29 . If the check mark information is 0, the voice data is passed to the voice recognition program 29 .
  • a word represented by data in a frame immediately preceding a frame including check mark information of 1 has been described to be not regarded as an object of voice recognition.
  • the present invention is not limited to this mode.
  • a word represented by data in a frame including check mark information of 1 may not be regarded as an object of voice recognition.
  • the result of voice recognition has been described to be displayed as characters on the display 3 A.
  • the present invention is not limited to this mode.
  • the characters may be output as character data to a recording medium or may be displayed and output simultaneously.
  • the check mark information has been described to be recorded during recording by the digital recorder 26 .
  • the system may be configured so that the check mark information can be designated during reproduction by the digital recorder 26 or reproduction by the personal computer 1 A.
  • a check mark is recorded in voice data.
  • the check mark is detected.
  • a word represented by data in a frame having a check mark inscribed therein or a word represented by data in a frame preceding or succeeding the frame having the check mark inscribed therein is not regarded as an object of voice recognition. Consequently, treatment of an unnecessary word or the like which has not been able to be achieved in the past can be carried out easily without the need of increasing the load of voice recognition, that is, the need of especially fast processing. This results in a good-quality dictation system capable of achieving voice recognition properly and creating a document with few mistakes.
  • FIGS. 18 to 21 relate to the third embodiment of the present invention.
  • the conceptual overall configuration of a dictation system of the third embodiment is identical to that shown in FIG. 12.
  • the electric configuration of the personal computer 1 A is identical to that shown in FIG. 16.
  • FIG. 18 is a diagram showing the overall flow of reading voice data from a voice memory and recognizing voice which is followed by the dictation system
  • FIG. 19 is a flowchart describing processing of voice recognition carried out by the dictation system.
  • step S 71 when the processing is started, voice data recorded in units of a file is read from a voice memory 61 in the miniature card 10 A or internal recording medium 53 , and Decoding 62 is executed (step S 71 ).
  • the result of decoding 62 is sent to Voiceful-or-voiceless Judgment 63 and Sample Absolute Value Averaging 64 .
  • Voiceful-or-voiceless Judgment 63 calculates a threshold value used for voiceful-or-voiceless judgment (step S 72 ). Based on the calculated threshold value, whether voice data is voiceful or voiceless is judged (step S 73 ). This processing will be explained in detail later in conjunction with FIG. 20. The result of voiceful-or-voiceless judgment 63 is sent to Sample Absolute Value Averaging 64 .
  • Sample Absolute Value Averaging 64 and Gain Calculation 65 are executed to calculate a gain (step S 74 ). This processing will be described in conjunction with FIG. 21 later. Based on a gain calculated by Gain Calculation 65 , Gain Multiplication 66 amplifies an output of Decoding 62 (step S 75 ).
  • Voice data adjusted to a proper level by Gain Multiplication 66 is sent to Voice Recognition 67 , whereby voice recognition is carried out (step S 76 ).
  • Character conversion is carried out for converting the result of voice recognition into character codes (step S 77 ). Resultant character codes are output and displayed 68 in a screen on the display 3 A or the like (step S 78 ).
  • FIG. 20 is a flowchart describing the contents of processing relevant to voiceful-or-voiceless judgment performed at steps S 72 and S 73 .
  • a variable f indicating a count of the number of frames is initialized to 0 (step S 81 ).
  • a level of frame energy e(f) is calculated according to an illustrated formula (step S 83 ).
  • s(i) denotes an input signal of the (i ⁇ 1)-th sample out of one frame
  • N denotes the number of frames constituting one file.
  • variable f a frame to be treated is an initial frame (step S 84 ). If the variable f is 1, a variable min indicating a minimum level of frame energy is set to e(1) (step S 86 ).
  • step S 84 If it is found at step S 84 that the variable f is not 1 , it is judged whether or not the level of frame energy e(f) is smaller than the variable min (step S 85 ). If the level of frame energy e(f) is smaller, the variable min is set to the level of frame energy e(f) (step S 87 ). By contrast, if the level of frame energy e(f) is not smaller, nothing is done but control is passed to the nest step S 88 .
  • step S 88 It is then judged whether or not the file has come to an end (step S 88 ). If the file has not come to an end, control is returned to step S 82 and the foregoing processing is repeated.
  • step S 89 a product of the variable min by a given value a (for example, 1.8) is set as a threshold value trs (step S 89 ). The processing is then exited.
  • a given value a for example, 1.8
  • This procedure of setting a threshold value is making the most of the fact that voice data is already recorded. Since the threshold value can be determined on the basis of the minimum energy level of the whole file, voiceful-or-voiceless judgment can be achieved with a little error.
  • minimum values of all read intervals that is, all the frames constituting a voice file
  • the present invention is not limited to this mode. Instead of the minimum values of all the intervals, a minimum value of an interval of a certain length will do.
  • FIG. 21 is a flowchart describing the contents of gain calculation to be performed at step S 74 in FIG. 19.
  • a variable f indicating a count of the number of frames
  • a variable SumAbs indicating a sum of absolute values of samples
  • a variable Cnt indicating the number of additions
  • variable f is then incremented (step S 92 ). It is judged whether or not the level of frame energy e(f) calculated within the processing described in FIG. 20 is larger than the threshold value trs (step S 93 ). If the level of frame energy e(f) is larger than the threshold value trs, the sum of absolute values of samples of frames is added to the variable SumABs (step S 94 ), and the variable Cnt is incremented (step S 95 ).
  • step S 93 If it is found at step S 93 that the level of frame energy e(f) is equal to or smaller than the threshold value, control is passed to the next step S 96 .
  • step S 96 it is judged whether or not the file has come to an end. If the file has not come to an end, control is returned to step S 92 and the foregoing processing is repeated.
  • step S 96 If it is judged at step S 96 that the file has come to an end, the variable SumAbs is divided by the variable Cnt in order to calculate an average value, average, of the absolute values of samples of frames (step S 97 ).
  • a given value LEV is divided by the average value, average, in order to calculate a gain, gain (step S 98 ).
  • the given value LEV is set to the average value of the predicted absolute values of samples. For example, an average value of absolute values of voice samples used to learn voice data by a voice recognizer is employed.
  • already-recorded voice data can be adjusted to a sound level suitable for voice recognition.
  • Voice recognition can therefore be carried out on a stable basis irrespective of a sound level of recorded voice data. This results in a high-quality dictation system.

Abstract

The present invention relates to what causes a computer to read a voice recognition program from a first recording medium, and read voice data from a second recording medium, and causes a CPU in the computer to recognize voice represented by the read voice data according to the voice recognition program, convert the result of voice recognition into text data, and display the converted text data on a display unit.
Also included is a check mark button used by a speaker to designate a portion of voice data, which is input through a microphone, corresponding to an unnecessary word or the like. The portion of the voice data in which a check mark is inscribed is not regarded as an object of voice recognition. Only the other portion of the voice data in which the check mark is not inscribed is regarded as an object of voice recognition, and voice recognition is thus carried out.
Furthermore, the sound level of a voiceful portion of voice data is rated. The gain of the voice data is adjusted according to the rated level. On the basis of the voice data whose sound level has been adjusted, voice recognition is carried out.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0001]
  • The present invention relates to a voice recognition apparatus and a recording medium having a voice recognition program recorded therein. More particularly, this invention is concerned with a voice recognition apparatus for recognizing voice data, and a recording medium in which a voice recognition program causing a computer to recognize voice data is recorded. [0002]
  • [0003] 2. Description of the Related Art
  • In recent years, research and development of a voice recognition technology has been undertaken in earnest. A technological means capable of recognizing voice in real time has been proposed. This kind of technology has been adapted to various kinds of products or usages, for example, reservation of tickets by telephone or voice commanding within car navigation. [0004]
  • Along with a recent breakthrough in voice recognition technology and improvement in performance of personal computers, a technology for documenting voice input through a microphone connected to a personal computer by recognizing the voice within application software running in the personal computer, and displaying the document has been developed. [0005]
  • An example of a software package enabling voice recognition is a product “Voice Type 3.0 for Windows 95” released recently by IBM Ltd. This product converts voice input through a microphone into text data in real time and enjoys a considerably high recognition ratio. [0006]
  • However, the application software permits real-time input through a microphone that is only one means for inputting voice data. An already existent voice file cannot be recognized directly. [0007]
  • One object of development of the aforesaid voice recognition technology is to realize a so-called voice word processor or a dictation system for automatically creating a document on the basis of voice data input by performing dictation, and displaying the document in a screen or the like. [0008]
  • A conventionally adopted means is such that when the contents of a document to be created are dictated and temporarily recorded by a recording apparatus such as a tape recorder, and a secretary, typist, or the like reproduces the dictated contents and documents them using a documentation apparatus such as a type writer, word processor, or the like. This style has been generally adopted as one form of effective utilization of the recording apparatus such as a tape recorder. [0009]
  • As for such dictational recording, a technique of appending an index mark or end mark to voice data so as to give instructions to a secretary or typist has been known in the past. According to a prior art of appending such a mark, a desired region of voice data is not designated as an interval but a specified region of voice data is designated as a point. [0010]
  • In the foregoing form of utilization in which a recording apparatus is used for dictation, the birth of a technology for automatically converting the contents of a record into a document has been greatly demanded in the past. [0011]
  • In actual dictation, a word irrelevant to contents to be informed may be contained. For example, when written sentences are recited, an incorrectly uttered word or a word having no meaning such as “Ah” or “Well” (hereinafter an unnecessary word) may be contained (frequently in some cases). [0012]
  • In this case, the performance of voice recognition deteriorates. This leads to a drawback that a document displayed in a screen contains many mistakes. A technology for constructing a dictation system by taking account of the above unnecessary words and creating language models that cover all words including the unnecessary words and that are intended to be used for voice recognition has been proposed in the past. [0013]
  • For example, according to Japanese Unexamined Patent Publication No. 7-5893, there is provided a voice recognition apparatus comprising: a standard pattern memory means for storing standard patterns; an unnecessary word pattern memory means for storing patterns of unnecessary words; a word spotting means for spotting as a word or word-spotting a standard pattern stored in the standard pattern memory means or a pattern of an unnecessary word stored in the unnecessary word pattern memory means on the basis of input voice, and outputting a corresponding interval and score; a producing means for hypothesizing the contents of uttered voice and producing a representation of the meaning; and an analyzing means for analyzing the result of word-spotting, which is performed by the word spotting means, on the basis of the representation of the meaning of the hypothesis produced by the producing means. The analyzing means allocates a score resulting from word-spotting performed on the pattern of an unnecessary word to remaining intervals, of which corresponding standard patterns or patterns of an unnecessary word have not been word-spotted, among all the intervals of data items constituting the voice. The result of word-spotting performed by the word spotting means is then analyzed. [0014]
  • However, the voice recognition apparatus described in the Japanese Unexamined Patent Publication No. 7-5893 has difficulty in carrying out practical processing within an existing computer (especially a computer of a personal level) because the data size of language models becomes enormous. [0015]
  • Using a currently commercialized product, a speaker must be careful in not uttering an unnecessary word or the like and cannot therefore help feeling clumsiness. [0016]
  • For improving the performance of voice recognition, it is required that the sound level of input voice is proper. Currently, it is hard to guarantee a high recognition ratio over a wide range of sound levels from a low level to a high level. A system is therefore designed to provide a maximum recognition ratio relative to an average sound level of voice. [0017]
  • In a voice recognition apparatus of a mode in which voice is input through a microphone as mentioned above, a sound-level meter for indicating a sound level of voice is displayed in, for example, a screen or the like so that a speaker himself/herself can manage his/her sound level of voice properly. [0018]
  • As an example of an embodiment of this technology, a sound pressure level display for a voice recognition apparatus comprising a first sound receiver for receiving a voice signal, a second sound receiver for receiving a noise whose level is close to that of the voice signal received by the first sound receiver, a sound pressure level ratio calculating means for calculating a ratio of a sound pressure level of a voice signal input to the first sound receiver to a ratio of a sound pressure level of a noise input to the second sound receiver, and a display means for displaying the ratio of sound pressure levels calculated by the sound pressure level ratio calculating means is described in Japanese Unexamined Patent Publication No. 5-231922. [0019]
  • However, it is annoying for a speaker to manage his/her own voice so that the sound level will become proper. There is therefore an increasing demand for a user-friendly voice recognition apparatus. Moreover, since the sound level of input voice cannot be detected using already recorded voice data, the technology disclosed in the Japanese Unexamined Patent Publication No. 5-231922 cannot be adapted as it is. It cannot be judged whether or not the sound level of voice data is suitable for voice recognition. Besides, since the sound pressure level display is not provided with a facility for adjusting a sound level of voice autonomously, a voice recognition ratio may vary abruptly depending on a sound level indicated by recorded voice data. [0020]
  • OBJECTS AND SUMMARY OF THE INVENTION
  • The first object of the present invention is to provide a voice recognition apparatus for recognizing voice represented by voice data recorded in a given recording medium and a recording medium in which a voice recognition program is recorded. [0021]
  • The second object of the present invention is to provide a voice recognition apparatus capable of treating an unnecessary word or the like contained in voice without the need of especially fast processing, and a recording medium in which a voice recognition program is recorded. [0022]
  • The third object of the present invention is to provide a voice recognition apparatus capable of recognizing voice on a stable basis irrespective of a sound level indicated by recorded voice data, and a recording medium in which a voice recognition program is recorded. [0023]
  • Briefly, a voice recognition apparatus in accordance with the present invention for recognizing voice within a programmed computer comprises a voice data reading means for reading voice data from a voice data recording medium in which the voice data is recorded, a voice recognizing means for recognizing voice represented by the voice data so as to convert the voice into text data, and a display means for displaying the text data. [0024]
  • A recording medium in accordance with the present invention having a voice recognition program recorded therein is used to run the voice recognition program in a computer, whereby the voice recognition program causes the computer to read voice data from a voice data recording medium in which the voice data is recorded, recognize voice represented by the voice data so as to convert the voice into text data, and display the text data. [0025]
  • These objects and advantages of the present invention will become further apparent from the following detailed explanation. [0026]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram schematically showing the configuration of a computer that is the first embodiment of a voice recognition apparatus in accordance with the present invention; [0027]
  • FIG. 2 is a flowchart describing the first example (first voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment; [0028]
  • FIG. 3 is a diagram showing an example of display appearing when voice recognition application software read from the first recording medium is activated in the computer of the first embodiment, or a main screen used to reproduce compressed voice data; [0029]
  • FIG. 4 is a diagram showing an example of a screen in which text data is displayed when the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment; [0030]
  • FIG. 5 is a diagram showing an example of a dialog box screen used to set a time interval between voice recognitions and the number of displayed words when a given number of words are recognized at intervals of a given time since the start of a file subjected to voice recognition, after the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment; [0031]
  • FIG. 6 is a diagram showing an example of a screen in which a given number of words recognized at intervals of a given time since the start of a file subjected to voice recognition after the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment is displayed; [0032]
  • FIG. 7 is a flowchart describing a second example (second voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment; [0033]
  • FIG. 8 is a flowchart describing a third example (third voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment; [0034]
  • FIG. 9 is a diagram showing an example of a dialog box screen used to set a word to be retrieved for voice recognition when only a word that must be recognized in voice and contained in a voice compressed file is recognized in voice after the voice recognition application software read from the first recording medium is activated in the computer of the first embodiment; [0035]
  • FIG. 10 is a flowchart describing a fourth example (fourth voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment; [0036]
  • FIG. 11 is a flowchart describing a fifth example (fifth voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, and run in the first embodiment; [0037]
  • FIG. 12 is a conceptual diagram showing the overall configuration of a dictation system of the second embodiment of the present invention; [0038]
  • FIG. 13 is a block diagram showing the electrical configuration of a digital recorder of the second embodiment; [0039]
  • FIG. 14 is a diagram showing a scene in which a check mark button of the digital recorder is handled during dictation in the second embodiment; [0040]
  • FIG. 15 is a diagram showing the format of data to be recorded in a voice memory of a miniature card by means of the digital recorder of the second embodiment; [0041]
  • FIG. 16 is a block diagram showing the electrical configuration of a personal computer of the second embodiment; [0042]
  • FIG. 17 is a flowchart describing voice recognition carried out in the personal computer of the second embodiment; [0043]
  • FIG. 18 is a diagram showing an overall flow of reading voice data from a voice memory and recognizing voice which is followed by the dictation system of the third embodiment of the present invention; [0044]
  • FIG. 19 is a flowchart describing voice recognition carried out by a dictation system of the third embodiment of the present invention; [0045]
  • FIG. 20 is a flowchart describing the contents of processing relevant to judgment of voice or voiceless which is briefed in FIG. 19; and [0046]
  • FIG. 21 is a flowchart describing the contents of gain calculation briefed in FIG. 19. [0047]
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Referring to the drawings, embodiments of the present invention will be described below. [0048]
  • FIG. 1 is a block diagram schematically showing the configuration of a computer that is the first embodiment of a voice recognition apparatus in accordance with the present invention. [0049]
  • A computer [0050] 1 consists, as shown in FIG. 1, mainly of: a central processing unit (CPU) 1 a responsible for control of the whole computer 1; a first input unit 5 in which an external recording medium (first recording medium 7) having a given program recorded therein can be mounted freely; a first recording medium driver 6, incorporated in the first input unit 5, for reading a given program from the first recording medium 7 under the control of the CPU la when the first recording medium 7 is mounted in the first input unit 5; a second input/output unit 8 in which an external recording medium (second recording medium 10) having given voice data recorded therein can be mounted freely; a second recording medium driver 9, incorporated in the second input/output unit 8, for reading given voice data and writing given data from and in the second recording medium 10 under the control of the CPU 1 a when the second recording medium 10 is mounted in the second input/output unit 8; an operation unit 2 for inputting a given instruction entered by a user; a display unit 3 serving as a display means for displaying given data after given processing is carried out by the CPU 1 a; and a voice output unit 4 for outputting produced voice after given processing is carried out by the CPU 1 a.
  • The [0051] computer 1 is configured to permit operation of an operation system (OS) capable of executing a plurality of application software concurrently (multitasking). Hereinafter, a description will be made on the assumption that the OS is installed in the computer 1.
  • The [0052] first recording medium 7 is a recording medium in which a given voice recognition program is recorded. In this embodiment, for example, a portable recording medium such as a CD-ROM or floppy disk is imagined as the recording medium.
  • Moreover, the [0053] second recording medium 10 is a voice data recording medium in which given voice data is recorded. The second recording medium 10 will be described below.
  • The [0054] second recording medium 10 is a recording medium in which voice data acquired by an external solid-state recorder is recorded. In this embodiment, a card-shaped recording medium that is a flash memory is imagined.
  • In recent years, there has been an increasing demand for a flash memory. Digital solid-state recorders using the flash memory as a recording medium have been commercialized. The flash memory is known in many types of card-shaped recording media. For example, a memory card conformable to the PCMCIA standard, a miniature card manufactured by Intel Corp., an SSDFC manufactured by Toshiba Co., Ltd., and a compact flash memory manufactured by SunDisk Co., Ltd. are known. [0055]
  • In general, these card-shaped flash memories are connected to a personal computer via an adaptor or the like, and capable of transferring given data. Many of the existing card-shaped memories have a storage capacity ranging from 2M bytes to 8M bytes. Moreover, the digital solid-state recorders currently on the market include those capable of recording sound in a card having a storage capacity of 2M bytes for 20 min. to 40 min. [0056]
  • The solid-state recorders convert an analog signal input through a microphone into digital PCM data, which is digital data modulated in pulse code, or the like, compresses the PCM data according to an algorithm for encoding based on the ADPCM or CELP, and records compressed data in a flash memory card. The thus recorded data can be read directly by a personal computer via an adaptor. [0057]
  • The [0058] computer 1 of this embodiment reads voice data from the flash memory card (second recording medium 10) mounted as mentioned above.
  • Next, a voice recognition operation for recognizing voice represented by voice data which is carried out by the [0059] computer 1 will be described.
  • To begin with, a user mounts a recording medium (first recording medium [0060] 7), in which a given voice recognition program is recorded, in the first input unit 5 of the computer 1. The computer 1 reads a given voice recognition program, which is application software, from the connected first recording medium 7 into an internal memory, which is not shown, via the first recording medium driver 6. This causes the CPU 1 a to control a voice recognition operation following the program.
  • Now, the voice recognition operation to be carried out according to the voice recognition program will be described. [0061]
  • FIG. 2 is a flowchart describing the first example (first voice recognition program) of a voice recognition program recorded in the recording medium in accordance with the present invention having the voice recognition program recorded therein. [0062]
  • When the [0063] second recording medium 10 is mounted in the computer 1, the CPU 1 a reads voice data from a voice compressed file containing voice data compressed and recorded by an external solid-state recorder (step S1). The first voice recognition program stretches compressed voice data into PCM data by reversely following a compression algorithm according to which data is recorded by the solid-state recorder (step S2). In other words, this processing that is identical to reproduction performed by the solid-state recorder is carried out by the computer 1 controlled by the first voice recognition program.
  • The PCM data stretched at step S[0064] 2 is subjected to voice recognition (step S3). The voice-recognized data or data recognized in voice is converted into text data (step S4), and the converted text data is displayed on a display (display unit 3) (step S5). This processing is continued until the voice-recognized data comes to an end (step S6).
  • FIG. 3 shows an example of display appearing when the voice recognition application software read from the [0065] first recording medium 7 is activated in the computer 1 of this embodiment, or a main screen used to reproduce voice data that is compressed data representing voice.
  • FIG. 3 shows a [0066] main screen 11 in which: a menu bar 12 used to select file-related handling or editing-related handling; a tool button bar 13 presenting easily discernibly various kinds of handling in the form of icons; a voice file list box 14 which displays a list of information such as names of voice files transferred from the second recording medium 10, recording times, dates of recording, and priorities and in which a voice file whose data is reproduced or voice-recognized is highlighted in contrast with the other voice files; and a reproduction control 18 used to carry out processing such as replay, stop, fast feed, or fast return are displayed.
  • The [0067] tool button bar 13 is provided with a voice recognition tool button group 21 consisting of a voice recognition start button 22, word recognition button 23, and list display button 24.
  • Moreover, the [0068] reproduction control 18 is provided with a current position-of-reproduction indicator slider 15, lines 16, and an index search button 17.
  • In the [0069] main screen 11 shown in FIG. 3, when the voice recognition start button 22 belonging to the voice recognition tool button group 21 included in the tool button bar 13 is pressed, voice recognition of a voice file highlighted in the voice file list box 14 is started. A text editor shown in FIG. 4 is started up. Recognized voice data is displayed as serial text data in the editor screen.
  • Next, a processing operation of recognizing a given number of words at intervals of a given time since the start of a file subjected to voice-recognition and displaying a list of the words will be described. [0070]
  • The [0071] list display button 24 belonging to the voice recognition tool button group 21 is a button used to recognize a certain number of words at intervals of a certain time since the start of a file subjected to voice recognition, and display the words in the form of a list.
  • When the [0072] list display button 24 is pressed, a dialog box shown in FIG. 5 appears. A user is prompted to enter the setting of a time in sec, at intervals of which words will be recognized, since the start of a file (file subjected to voice recognition) highlighted in the voice file list box 14, and the setting of the number of words to be recognized and displayed. If the user wants to suspend the processing, he/she presses a cancel button shown in FIG. 5. Thus, control can be returned to the main screen shown in FIG. 3.
  • When the user enters the setting of the time interval and the setting of the number of words to be recognized and presses the start button, the dialog box shown in FIG. 5 is closed and a list box shown in FIG. 6 appears. [0073]
  • FIG. 7 is a flowchart describing the second example (second voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein. Herein, a processing operation of recognizing a given number of words at intervals of a certain time since the start of a file subjected to voice recognition, and displaying the words in the form of a list is described. [0074]
  • Specifically, when the user sets the time interval and the number of words to be recognized, and then presses the start button, voice data is first read from a file subjected to voice recognition and recorded in the second recording medium [0075] 10 (step S11). The second voice recognition program stretches the compressed voice data in the same manner as the first voice recognition program (step S12). If a word coincident with a time instant when the set time has elapsed is detected (step S13), stretched PCM data starting with the word is voice-recognized (step S14).
  • The voice-recognized data is converted into text data (step S[0076] 15), and the converted text data is, as shown in FIG. 6, displayed by the given number of words on the display (display unit 3). Specifically, in the list box shown in FIG. 6, display of a position-of-reproduction time passed since the start of the voice-recognized file and display of text data starting at the position of reproduction are carried out sequentially by the number of words set in the dialog box shown in FIG. 5. This processing is terminated when data comes to an end (step S17).
  • Next, a processing operation of recognizing voice started at a given position in a file subjected to voice recognition will be described. [0077]
  • When the position of reproduction indicated by the current position-of-[0078] reproduction indicator slider 15 in the main screen 11 shown in FIG. 3 is changed, if the voice recognition start button 22 belonging to the voice recognition tool button group 21 is pressed, voice recognition is started at the changed position of reproduction. The result of voice recognition then appears in the text editor screen shown in FIG. 4.
  • FIG. 8 is a flowchart describing the third example (third voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, wherein a processing operation of starting voice recognition at a given position in a file subjected to voice recognition and displaying the result is described. [0079]
  • Specifically, when a user changes the position of reproduction indicated by the current position-of-[0080] reproduction indicator slider 15 shown in FIG. 3, voice data is read from a file subjected to voice recognition in the second recording medium (step S21). The third voice recognition program stretches compressed voice data in the same manner as the first voice recognition program (step S22). If a word coincident with a given position is detected (step S23), stretched PCM data starting with the word at the given position is voice-recognized (step S24).
  • The voice-recognized data is converted into text data (step S[0081] 25), and the converted text data is displayed on the display (display unit 3) (step S26). In other words, text data starting at the given position set in the editor screen shown in FIG. 4 is displayed. This processing is terminated when data comes to an end.
  • Next, a processing operation of voice-recognizing a desired word, which should be voice-recognized, among those contained in a file subjected to voice recognition, and indicating the positions of the desired word will be described. [0082]
  • The [0083] word recognition button 23 belonging to the voice recognition tool button group 21 shown in FIG. 3 is a button for use in voice-recognizing a desired word, which should be voice-recognized, among those contained in a file subjected to voice recognition, and indicating the positions of the desired word. Specifically, when the word recognition button 23 is pressed, only the word that should be voice-recognized is retrieved from a voice-compressed file by carrying out voice recognition. Retrieved locations are indicated with the lines 16 in the current position-of-reproduction indicator slider 15 so that they can be discerned at sight. The details will be described below.
  • When the [0084] word recognition button 23 is pressed, the dialog box shown in FIG. 9 appears. With the dialog box, a user is prompted to enter a specified word that should be recognized. For suspending this processing, the cancel button is pressed. The processing is then exited and the main screen shown in FIG. 3 is returned.
  • FIG. 10 is a flowchart describing the fourth example (fourth voice recognition program) of a voice recognition program recorded in a recording medium in accordance with the present invention having the voice recognition program recorded therein, wherein a processing operation of voice-recognizing desired words alone, which should be voice-recognized, among those contained in a file subjected to voice recognition, and indicating the positions of the desired words is described. [0085]
  • Specifically, after a desired word that should be recognized is entered in the screen shown in FIG. 9 by a user, when the start button is pressed, voice data is read from a file subjected to voice recognition in the second recording medium (step S[0086] 31). The fourth voice recognition program stretches compressed voice data in the same manner as the first voice recognition program (step S32). Voice recognition is then started at the start of the selected voice-compressed file (step S33).
  • Thereafter, when the word registered in the dialog box shown in FIG. 9 is recognized from among those contained in the file subjected to voice recognition (step S[0087] 34), the positions of the word are indicated with the lines 16 in the current position-of-reproduction indicator slider 15 in the main screen 12 shown in FIG. 3. An index mark is inserted into a voice data item coincident with the position. Every time the index search button 17 in the reproduction control 18 in the main screen 11 shown in FIG. 3 is pressed, control is skipped sequentially to one of the positions indicated with the lines 16 (step S35 and step S36). This facility can be validated not only when reproduction is stopped but also when reproduction is under way.
  • When voice recognition involving the end of the voice-compressed file is completed, all the positions at which the registered word is found are indicated with the [0088] lines 16 in the current position-of-reproduction indicator slider 15.
  • This processing is terminated when data comes to an end (step S[0089] 37).
  • Next, a processing operation of deleting a portion of voice data corresponding to a designated portion of text data from a file subjected to voice recognition will be described. [0090]
  • FIG. 11 is a flowchart describing the fifth example (fifth voice recognition program) of a voice recognition program recording in a recording medium in accordance with the present invention having the voice recognition program recorded therein, wherein a processing operation of deleting a portion of voice data corresponding to a designated portion of text data from the [0091] second recording medium 10 is described.
  • First, voice data is read from a file subjected to voice recognition in the second recording medium [0092] 10 (step S41). The fifth voice recognition program stretches compressed voice data in the same manner as the first voice recognition program (step S42). The stretched PCM data is voice-recognized (step S43).
  • The voice-recognized data is converted into text data (step S[0093] 44). Addresses in the second recording medium 10 associated with words are detected and then listed (step S45). Table 1 indicates the addresses in the second recording medium 10 allocated to an example of text data “The sky is blue and the ocean is also blue.”
    TABLE 1
    Leading and last
    addresses in a
    Text Word recording medium
    1 the 3468H
    3492H
    2 sky 3494H
    3560H
    3 is 3580H
    3600H
    4 blue 3610H
    3620H
    5 and 3622H
    3640H
    6 the 3692H
    3699H
    7 ocean 3706H
    3720H
    8 is 3724H
    3736H
    9 also 3740H
    3753H
    10 blue 3760H
    3770H
  • Thereafter, the above text data is kept displayed on the display until the data comes to an end (step S[0094] 46 and step S47).
  • When data comes to an end, it is judged whether or not the text data should be deleted (step S[0095] 48). If the data should be deleted, a position of deletion is designated in the text data (step S49). Addresses in the second recording medium 10 associated with the designated position are retrieved from Table 1 (step S50).
  • Thereafter, voice data is read from the second recording medium [0096] 10 (step S51), and stretched (step S52). The portion of the voice data defined by the addresses is deleted (step S53). Thereafter, the voice data is compressed again (step S54) and then overwritten (step S55).
  • In this embodiment, addresses are listed so that a position of deletion in text data can be associated with a position in the second recording medium. The present invention is not limited to this mode. For example, times passed since the start of a file may be recorded in the form of a list. [0097]
  • According to the voice recognition program of the first embodiment recorded in a recording medium to be adapted to a computer, a CPU is conventionally requested to exhibit a great processing capability because when voice output through a microphone is recognized directly, voice recognition must be carried out in real time. However, since stretching of a voice-compressed file and voice recognition should merely be repeated, the advantage that real-time processing is not required and the CPU is not requested to exhibit a great processing capability is exerted. [0098]
  • Moreover, since real-time processing is not required, there is the advantage that an algorithm permitting voice recognition with high precision can be created. [0099]
  • Furthermore, since the contents of a portion of a voice-compressed file can be discerned at sight, what is recorded at which position of reproduction can be grasped broadly. [0100]
  • Only a portion of an existing voice-compressed file which should be converted into text data can be voice-recognized. [0101]
  • In addition, control can be skipped spontaneously from an existing voice-compressed file to the position of a word serving as a keyword. A position of the word that should be retrieved can be reached at once. [0102]
  • Furthermore, even after data is recorded, since a word can be designated later and an index mark can be inscribed in the recorded data, usefulness improves. Besides, even after data is recorded, since an unnecessary word can be designated later and deleted from the recorded data, an unsuccessful dictation can be deleted easily. [0103]
  • In the [0104] computer 1 of the first embodiment, the first recording medium 7 is an external recording medium. After a recording medium having a given voice recognition program recorded therein is mounted in the computer 1, the given voice recognition program that is application software can be read from the recording medium. The present invention is not limited to this mode. Alternatively, any mode will do as long as a given voice recognition program can be activated by working on the CPU 1 a in the computer.
  • For example, the [0105] computer 1 may be provided with a recording medium having a voice recognition program recorded therein in advance so that the voice recognition program can be read any time.
  • FIGS. [0106] 12 to 17 relate to the second embodiment of the present invention. FIG. 12 is a conceptual diagram showing the overall configuration of a dictation system to which the present invention is adapted.
  • The dictation system comprises: as shown in FIG. 12, a [0107] digital recorder 26 that is a voice recording apparatus for converting voice into an electric signal and producing voice data; a miniature card 10A, freely detachably attached to the digital recorder 26, serving as a voice date recording medium in which voice data is recorded; a PC card adaptor 27 used to insert the miniature card 10A into a PC card slot 9A (See FIG. 16) to be described later for connection; and a personal computer 1A including a display 3A serving as a display means, and a keyboard 2A and mouse 2B serving as an operation unit, and acting as a voice recognition apparatus for processing voice data read from the miniature card 10A through the PC card slot 9A according to a control program 28 or a voice recognition program 29.
  • FIG. 13 is a block diagram showing the electrical configuration of the [0108] digital recorder 26.
  • The digital recorder [0109] 26 comprises: as shown in FIG. 13, a microphone 31 serving as a voice data input means for inputting voice and converting it into an electric signal; a microphone amplifier 32 for amplifying a voice signal sent from the microphone 31 to a proper level; a lowpass filter 33 for removing unnecessary high-frequency components from the voice signal amplified by the microphone amplifier 32; an A/D converter 34 for converting an analog voice signal output from the lowpass filter 33 into digital data; an encoder-decoder 35 for encoding (compressing) the digitized voice signal during an recording operation, and decoding (stretching) encoded data during a reproduction operation; a memory control unit 36 serving as a recording means for controlling recording or reproduction of voice information in or from a voice memory 37, which will be described later, on the basis of address information given by a system control unit 38 to be described later; a voice memory 37 incorporated in the miniature card 10A serving as a voice data recording medium and formed with, for example, a semiconductor memory; a miniature card attachment 44 serving as a recording medium attaching means enabling the miniature card 10A including the voice memory 37 to be freely attached or detached to or from the digital recorder 26; a D/A converter 39 for converting the digital voice signal output from the encoder-decoder 35 into an analog signal; a lowpass filter 40 for removing unnecessary high-frequency components from a voice signal converted into an analog form by the D/A converter 39; a power amplifier 41 for amplifying an analog voice signal output from the lowpass filter 40; a loudspeaker 42 for uttering sound when driven by the power amplifier 41; an operation input unit 43 composed of various kinds of operation buttons including a check mark button 43 a (See FIG. 14) to be described later; and a system control unit 38 that controls the digital recorder 26 including the encoder-decoder 35, memory control unit 36, and voice memory 37 in a centralized manner and that serves as a recording means to which an output terminal of the operation input unit 43 is connected.
  • FIG. 14 is a diagram showing a scene in which the check mark button of the digital recorder is handled during dictation. [0110]
  • The [0111] check mark button 43 a serving as an interval designating means of the operation input unit 43 is, as shown in FIG. 14, located at a position enabling the thumb of a hand, by which the digital recorder 26 is grabbed, to handle the check mark button easily. The check mark button is a button to be pressed in order to append a check mark, which indicates that an uttered word is an unnecessary word, to voice data when an unnecessary word or the like is uttered while the contents of a document to be created are being dictated.
  • The unnecessary word or the like is uttered unconsciously. The instant an unnecessary word was uttered, a speaker can recognize the uttered word as an unnecessary word. Since the [0112] check mark button 43 a is located at a position enabling the speaker to press it easily, a check mark can be appended readily if necessary.
  • FIG. 15 is a diagram showing the format of data to be recorded in the [0113] voice memory 37 in the miniature card 10A by the digital recorder 26.
  • One record data is managed in the form of a file. In each file, information, for example, a date of recording and a recording time is written as a file header. In the remaining area, data divided into frames is written. [0114]
  • Moreover, each frame includes check mark information indicating whether or not the [0115] check mark button 43 a has been pressed, and encoded voice data. The check mark information is structured as, for example, a flag of, for example, 1 bit long. When the check mark button 43 a is pressed, the flag is set to “1.” When the check mark button 53 a is not pressed, the flag is set to “0.”
  • FIG. 16 is a block diagram showing the electrical configuration of the [0116] personal computer 1A.
  • The [0117] personal computer 1A carries out voice reproduction, information display, and the like according to the control program 28, carries out documentation according to the voice recognition program 29, and also carries out various kinds of processing according to the other various kinds of programs. The personal computer 1A comprises: a CPU 51 serving as a detecting means, a level adjusting means, a voice recognizing means, a voice rating means, a minimum value calculating means, a gain value calculating means, a multiplying means, and an averaging means; a main memory 52 serving as a recording medium offering a work area for the CPU 51; an internal recording medium 53 serving as a recording medium which is formed with, for example, a hard disk or floppy disk and in which the control program 28 and voice recognition program 29 are recorded; an external port 54 used to connect the personal computer to various kinds of external equipment; an interface 55 used to connect the display 3A to the personal computer; an interface 56 used to connect the keyboard 2A or mouse 2B; a loudspeaker 4A that is a voice output unit for uttering sound on the basis of voice data; an interface 57 used to connect the loudspeaker 4A; a PC card slot 9A which serves as a voice data reading means and into which the miniature card 10 attached to the PC card adaptor 27 is inserted; and an interface 58 used to connect the PC card slot 9A. The CPU 51, main memory 52, internal recording medium 53, external port 54, and interfaces 55, 56, 57, and 58 are interconnected over a bus.
  • Voice data may be read directly from the [0118] miniature card 10A via the PC card slot 9A. Alternatively, the voice data may be temporarily recorded in the internal recording medium 53 and read from the internal recording medium 53. Otherwise, the voice data may be read directly from the digital recorder 26 via a communication means or the like. Thus, the voice data reading means is not limited to the PC card slot.
  • Moreover, an example of screen display attained by running the control program in the personal computer is nearly identical to that shown in FIG. 3. [0119]
  • FIG. 17 is a flowchart describing processing of voice recognition carried out in the [0120] personal computer 1A.
  • The voice recognition is, as mentioned later, carried out stepwise in the order of phoneme recognition, word recognition, and sentence recognition. [0121]
  • Specifically, when the voice [0122] recognition start button 22 belonging to the voice recognition tool button group 21 in the tool button bar 13 in the main screen 11 is clicked, voice recognition is started. A voice file highlighted in the voice file list box 14 is read in units of a given frame (step S61), and decoded in units of the frame (step S62).
  • The decoded voice data is passed to the [0123] voice recognition program 29. First, a phoneme is identified (step S63). Word recognition is then carried out, wherein a word stream that matches input voice most satisfactorily is retrieved on the basis of a given language model suggested by the identified phoneme (step S64).
  • What is referred to as the language model is a model giving a probability of occurrence that suggests a given word stream. As the language model, various forms have been conceived. However, an efficient model taking account of unnecessary words or the like has not been devised yet. [0124]
  • In this embodiment, therefore, check mark information located at the start of each frame shown in FIG. 15 is checked to see if a word represented by data in a frame immediately preceding the frame is an unnecessary word or the like. [0125]
  • Specifically, it is judged whether or not the check mark information is 1 (step S[0126] 65). If the check mark information is 1, a word represented by data in a frame immediately preceding the frame is not regarded as an object of processing of sentence recognition of the next step (step S66). If the check mark information is 0, sentence recognition is carried out (step S67).
  • Character conversion for converting voice data into character codes on the basis of a recognized sentence (step S[0127] 68). The result of recognition is displayed in a screen on the display 3A (step S69).
  • Thereafter, it is judged whether or not the voice file has come to an end (step S[0128] 70). If the voice file has not come to an end, control is returned to step S61. If the voice file has come to an end, the processing is terminated.
  • The processing of not regarding an unnecessary word as an object of recognition according to the result of detecting check mark information has been described to be carried out within the [0129] voice recognition program 29. The present invention is not limited to this mode. Alternatively, the processing may be carried out within, for example, the control program 28, and the result may be passed to the voice recognition program 29.
  • In this case, the [0130] control program 28 causes the personal computer 1A to fetch voice data from the miniature card 10A, and to detect check mark information appended to the voice data. If the check mark information is 1, the voice data is not passed to the voice recognition program 29. If the check mark information is 0, the voice data is passed to the voice recognition program 29.
  • Moreover, a word represented by data in a frame immediately preceding a frame including check mark information of 1 has been described to be not regarded as an object of voice recognition. The present invention is not limited to this mode. For example, a word represented by data in a frame including check mark information of 1 may not be regarded as an object of voice recognition. [0131]
  • Furthermore, the result of voice recognition has been described to be displayed as characters on the [0132] display 3A. The present invention is not limited to this mode. For example, the characters may be output as character data to a recording medium or may be displayed and output simultaneously.
  • The check mark information has been described to be recorded during recording by the [0133] digital recorder 26. Alternatively, the system may be configured so that the check mark information can be designated during reproduction by the digital recorder 26 or reproduction by the personal computer 1A.
  • According to the second embodiment, when a speaker presses the check mark button, a check mark is recorded in voice data. During processing of reproduction and voice recognition, the check mark is detected. A word represented by data in a frame having a check mark inscribed therein or a word represented by data in a frame preceding or succeeding the frame having the check mark inscribed therein is not regarded as an object of voice recognition. Consequently, treatment of an unnecessary word or the like which has not been able to be achieved in the past can be carried out easily without the need of increasing the load of voice recognition, that is, the need of especially fast processing. This results in a good-quality dictation system capable of achieving voice recognition properly and creating a document with few mistakes. [0134]
  • FIGS. [0135] 18 to 21 relate to the third embodiment of the present invention. The conceptual overall configuration of a dictation system of the third embodiment is identical to that shown in FIG. 12. Moreover, the electric configuration of the personal computer 1A is identical to that shown in FIG. 16.
  • Next, FIG. 18 is a diagram showing the overall flow of reading voice data from a voice memory and recognizing voice which is followed by the dictation system, and FIG. 19 is a flowchart describing processing of voice recognition carried out by the dictation system. [0136]
  • As described in FIG. 19, when the processing is started, voice data recorded in units of a file is read from a [0137] voice memory 61 in the miniature card 10A or internal recording medium 53, and Decoding 62 is executed (step S71).
  • The result of [0138] decoding 62 is sent to Voiceful-or-voiceless Judgment 63 and Sample Absolute Value Averaging 64.
  • Voiceful-or-[0139] voiceless Judgment 63 calculates a threshold value used for voiceful-or-voiceless judgment (step S72). Based on the calculated threshold value, whether voice data is voiceful or voiceless is judged (step S73). This processing will be explained in detail later in conjunction with FIG. 20. The result of voiceful-or-voiceless judgment 63 is sent to Sample Absolute Value Averaging 64.
  • Sample [0140] Absolute Value Averaging 64 and Gain Calculation 65 are executed to calculate a gain (step S74). This processing will be described in conjunction with FIG. 21 later. Based on a gain calculated by Gain Calculation 65, Gain Multiplication 66 amplifies an output of Decoding 62 (step S75).
  • Voice data adjusted to a proper level by [0141] Gain Multiplication 66 is sent to Voice Recognition 67, whereby voice recognition is carried out (step S76).
  • Character conversion is carried out for converting the result of voice recognition into character codes (step S[0142] 77). Resultant character codes are output and displayed 68 in a screen on the display 3A or the like (step S78).
  • FIG. 20 is a flowchart describing the contents of processing relevant to voiceful-or-voiceless judgment performed at steps S[0143] 72 and S73.
  • When this processing is started, first, a variable f indicating a count of the number of frames is initialized to 0 (step S[0144] 81).
  • After the variable f is incremented (step S[0145] 82), a level of frame energy e(f) is calculated according to an illustrated formula (step S83). In the formula, s(i) denotes an input signal of the (i−1)-th sample out of one frame, and N denotes the number of frames constituting one file.
  • It is then judged whether or not the variable f is 1, that is, a frame to be treated is an initial frame (step S[0146] 84). If the variable f is 1, a variable min indicating a minimum level of frame energy is set to e(1) (step S86).
  • If it is found at step S[0147] 84 that the variable f is not 1, it is judged whether or not the level of frame energy e(f) is smaller than the variable min (step S85). If the level of frame energy e(f) is smaller, the variable min is set to the level of frame energy e(f) (step S87). By contrast, if the level of frame energy e(f) is not smaller, nothing is done but control is passed to the nest step S88.
  • It is then judged whether or not the file has come to an end (step S[0148] 88). If the file has not come to an end, control is returned to step S82 and the foregoing processing is repeated.
  • If it is judged at step S[0149] 88 that the file has come to an end, a product of the variable min by a given value a (for example, 1.8) is set as a threshold value trs (step S89). The processing is then exited.
  • This procedure of setting a threshold value is making the most of the fact that voice data is already recorded. Since the threshold value can be determined on the basis of the minimum energy level of the whole file, voiceful-or-voiceless judgment can be achieved with a little error. [0150]
  • As described above, minimum values of all read intervals (that is, all the frames constituting a voice file) are calculated. The present invention is not limited to this mode. Instead of the minimum values of all the intervals, a minimum value of an interval of a certain length will do. [0151]
  • Next, FIG. 21 is a flowchart describing the contents of gain calculation to be performed at step S[0152] 74 in FIG. 19.
  • When this processing is started, a variable f indicating a count of the number of frames, a variable SumAbs indicating a sum of absolute values of samples, and a variable Cnt indicating the number of additions are initialized to Os (step S[0153] 91).
  • The variable f is then incremented (step S[0154] 92). It is judged whether or not the level of frame energy e(f) calculated within the processing described in FIG. 20 is larger than the threshold value trs (step S93). If the level of frame energy e(f) is larger than the threshold value trs, the sum of absolute values of samples of frames is added to the variable SumABs (step S94), and the variable Cnt is incremented (step S95).
  • If it is found at step S[0155] 93 that the level of frame energy e(f) is equal to or smaller than the threshold value, control is passed to the next step S96.
  • Thereafter, it is judged whether or not the file has come to an end (step S[0156] 96). If the file has not come to an end, control is returned to step S92 and the foregoing processing is repeated.
  • If it is judged at step S[0157] 96 that the file has come to an end, the variable SumAbs is divided by the variable Cnt in order to calculate an average value, average, of the absolute values of samples of frames (step S97).
  • A given value LEV is divided by the average value, average, in order to calculate a gain, gain (step S[0158] 98). Herein, the given value LEV is set to the average value of the predicted absolute values of samples. For example, an average value of absolute values of voice samples used to learn voice data by a voice recognizer is employed.
  • According to the third embodiment, already-recorded voice data can be adjusted to a sound level suitable for voice recognition. Voice recognition can therefore be carried out on a stable basis irrespective of a sound level of recorded voice data. This results in a high-quality dictation system. [0159]
  • In this invention, it is apparent that a wide range of different working modes can be formed on the basis of the invention without a departure from the spirit and scope of the invention. This invention is not restricted to any specific embodiment but is limited to the appended claims. [0160]

Claims (23)

What is claimed is:
1. A voice recognition apparatus for recognizing voice within a programmed computer, comprising:
a voice data reading means for reading voice data from a voice data recording medium in which the voice data is recorded;
a voice recognition means for recognizing voice represented by the voice data and converting it into text data; and
a display means for displaying the text data.
2. A voice recognition apparatus according to
claim 1
, wherein voice data recorded in said voice data recording medium is compressed digital voice data.
3. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
recognize voice represented by the voice data so as to convert it into text data; and
display the text data.
4. A recording medium having a voice recognition program recorded therein according to
claim 3
, wherein said voice recognition program further causes the computer to recognize in voice or voice-recognize only a given number of words and convert them into text data at intervals of a given time when causing the computer to recognize voice represented by the voice data and convert it into text data.
5. A recording medium having a voice recognition program recorded therein according to
claim 3
or
4
, wherein said voice recognition program further causes the computer to voice-recognize only a given number of words starting at a given position in said voice data recording medium having voice data recorded therein and to convert them into text data when causing the computer to recognize voice represented by the voice data and convert it to text data.
6. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
recognize voice represented by the voice data so as to detect a given word; and
indicate the positions of the given word.
7. A recording medium having a voice recognition program recorded therein according to
claim 6
, wherein said voice recognition program further causes the computer to create an index mark at the positions of the given word in said voice data recording medium having the voice data recorded therein after causing the computer to recognize voice represented by the voice data and detect the given word.
8. A recording medium having a voice recognition program recorded therein according to
claim 7
, wherein said voice recognition program further causes the computer to reproduce voice data starting at a given position in said voice data recording medium having the voice data recorded therein after causing the computer to indicate the positions of the given word.
9. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
recognize voice represented by the voice data so as to convert it into text data;
display the text data;
enable designation of at least part of the text data using a designation input means; and
delete a portion of the voice data corresponding to a portion of the text data designated using said designation input means from said voice data recording medium, and cancel display of the designated portion of the text data.
10. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
recognize voice represented by the voice data so as to convert it into text data;
acquire position information of positions in said voice data recording medium, at which portions of the voice data corresponding to words of the text data are recorded, in one-to-one correspondence with the words;
display the text data;
enable designation of at least part of the text data using a designation input means;
acquire position information of positions in said voice data recording medium, at which a corresponding portion of the voice data is recorded, according to a word contained in a portion of the text data designated using said designation input means; and
delete the corresponding portion of the voice data from said voice data recording medium having the voice data recorded therein on the basis of the position information, and cancel display of the designated portion of the text data.
11. A voice recognition apparatus, comprising:
a voice data reading means for reading voice data from a voice data recording medium in which the voice data is recorded;
a detecting means for detecting a check mark that is appended to the voice data and distinguishes an interval within the voice data;
a voice recognition means for not recognizing voice represented by a portion of the voice data associated with the given check mark but recognizing voice represented by the other portion of the voice data; and
a display means for displaying the result of recognition performed by said voice recognition means.
12. A voice recognition apparatus according to
claim 11
, wherein the check mark is recorded by a voice recording apparatus including: a voice data input means for inputting voice data; an interval designating means enabling designation of a desired interval within the voice data input by said voice data input means; a recording means for appending a check mark, which distinguishes the interval designated using said interval designating means, to the voice data and recording the voice data in a voice data recording medium; and a recording medium attaching means for use in freely detachably attaching said voice data recording medium.
13. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
detect a check mark that is appended to the voice data and distinguishes an interval within the voice data;
not recognize voice represented by a portion of the voice data associated with the given check mark but recognize voice represented by the other portion of the voice data; and
display the result of voice recognition.
14. A voice recognition apparatus, comprising:
a voice data reading means for reading voice data from a voice data recording medium in which the voice data is recorded;
a level adjusting means for adjusting the sound level of the voice data read by said voice data reading means according to a given procedure;
a voice recognizing means for recognizing voice represented by the voice data whose sound level has been adjusted by said level adjusting means; and
a display means for displaying the result of recognition performed by said voice recognizing means.
15. A voice recognition apparatus, comprising:
a voice data reading means for reading voice data from a voice data recording medium in which the voice data is recorded;
a voice rating means for rating the voice data read by said voice data reading means as voiceful portions and voiceless portions;
a level adjusting means for adjusting the sound level of the voice data read by said voice data reading means on the basis of absolute values of amplitudes of voice signals of voice data items rated as the voiceful portions by said voice rating means;
a voice recognizing means for inputting the voice data whose sound level has been adjusted by said level adjusting means, and recognizing voice; and
a display means for displaying the result of recognition performed by said voice recognizing means.
16. A voice recognition apparatus according to
claim 15
, further comprising a minimum value calculating means for calculating a minimum value of an energy level of voice data of a given interval, wherein a criterion of said voice rating means is set on the basis of the minimum value calculated by said minimum value calculating means.
17. A voice recognition apparatus, comprising:
a voice data reading means for reading voice data from a voice data recording medium in which the voice data is recorded;
a voice rating means for rating the voice data read by said voice data reading means as voiceful portions and voiceless portions;
an averaging means for averaging absolute values of voice data items rated as the voiceful portions by said voice rating means;
a gain calculating means for calculating a gain on the basis of the average value;
a multiplying means for multiplying the voice data by the gain;
a voice recognizing means for recognizing voice represented by the voice data multiplied by the gain; and
a display means for displaying the result of recognition performed by said voice recognizing means.
18. A voice recognition apparatus, comprising:
a voice data reading means for reading voice data of a desired file from a voice data recording medium in which voice data digitized and divided into frames is recorded in units of a file;
a voice rating means for rating the voice data read by said voice data reading means as voiceful frames and voiceless frames;
an averaging means for averaging absolute values of voice data items in frames rated as the voiceful frames by said voice rating means;
a gain calculating means for calculating a gain on the basis of the average value;
a multiplying means for multiplying the voice data by the gain;
a voice recognizing means for recognizing voice represented by the voice data multiplied by the gain; and
a display means for displaying the result of recognition performed by said voice recognizing means.
19. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
adjust the sound level of the read voice data;
recognize voice represented by the voice data whose sound level has been adjusted; and
display the result of voice recognition.
20. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
rate the read voice data as voiceful portions and voiceless portions;
adjust the sound level of the read voice data on the basis of the absolute values of voice data items rated as the voiceful portions according to a given procedure;
recognize voice represented by the voice data whose sound level has been adjusted; and
display the result of voice recognition.
21. A recording medium having a voice recognition program recorded therein, wherein said voice recognition program causes a computer to:
read voice data from a voice data recording medium in which the voice data is recorded;
rate the read voice data as voiceful portions and voiceless portions;
average absolute values of voice data items rated as the voiceful portions;
calculate a gain on the basis of the average value;
multiply the voice data by the gain;
input the voice data multiplied by the gain so as to recognize voice; and
display the result of voice recognition.
22. A voice recognition apparatus according to
claim 1
, wherein said voice recognition apparatus includes an attachment permitting attachment of said voice data recording medium.
23. A voice recognition apparatus according to claim 22, wherein said voice data recording medium is attached to said attachment via an adaptor.
US09/088,996 1997-06-06 1998-06-02 Speech recognition with text generation from portions of voice data preselected by manual-input commands Expired - Lifetime US6353809B2 (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
JPH9-149729 1997-06-06
JP9-149729 1997-06-06
JP14972997A JP3905181B2 (en) 1997-06-06 1997-06-06 Voice recognition processing device and recording medium recording voice recognition processing program
JP10-011631 1998-01-23
JP10011632A JPH11212595A (en) 1998-01-23 1998-01-23 Voice processor, recording medium recorded with voice recognition program, and recording medium recorded with processing program
JP10-011632 1998-01-23
JP10011631A JPH11212590A (en) 1998-01-23 1998-01-23 Voice processor, recording medium with voice recognition program recorded, and recording medium with processing program recorded

Publications (2)

Publication Number Publication Date
US20010016815A1 true US20010016815A1 (en) 2001-08-23
US6353809B2 US6353809B2 (en) 2002-03-05

Family

ID=27279504

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/088,996 Expired - Lifetime US6353809B2 (en) 1997-06-06 1998-06-02 Speech recognition with text generation from portions of voice data preselected by manual-input commands

Country Status (3)

Country Link
US (1) US6353809B2 (en)
EP (1) EP0887788B1 (en)
DE (1) DE69829802T2 (en)

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040044428A1 (en) * 2002-08-29 2004-03-04 Hiroaki Yoshino Audio processor, audio processing method, computer program, and computer readable storage medium
US7107109B1 (en) * 2000-02-16 2006-09-12 Touchtunes Music Corporation Process for adjusting the sound volume of a digital sound recording
US20070208558A1 (en) * 2005-09-02 2007-09-06 De Matos Carlos E C System and Method for Measuring Sound
US7987282B2 (en) 1994-10-12 2011-07-26 Touchtunes Music Corporation Audiovisual distribution system for playing an audiovisual piece among a plurality of audiovisual devices connected to a central server through a network
US7992178B1 (en) 2000-02-16 2011-08-02 Touchtunes Music Corporation Downloading file reception process
US7996438B2 (en) 2000-05-10 2011-08-09 Touchtunes Music Corporation Device and process for remote management of a network of audiovisual information reproduction systems
US7996873B1 (en) 1999-07-16 2011-08-09 Touchtunes Music Corporation Remote management system for at least one audiovisual information reproduction device
US8028318B2 (en) 1999-07-21 2011-09-27 Touchtunes Music Corporation Remote control unit for activating and deactivating means for payment and for displaying payment status
US8032879B2 (en) 1998-07-21 2011-10-04 Touchtunes Music Corporation System for remote loading of objects or files in order to update software
US8074253B1 (en) 1998-07-22 2011-12-06 Touchtunes Music Corporation Audiovisual reproduction system
US8103589B2 (en) 2002-09-16 2012-01-24 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers
US8151304B2 (en) 2002-09-16 2012-04-03 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8184508B2 (en) 1994-10-12 2012-05-22 Touchtunes Music Corporation Intelligent digital audiovisual reproduction system
US8189819B2 (en) 1998-07-22 2012-05-29 Touchtunes Music Corporation Sound control circuit for a digital audiovisual reproduction system
US8214874B2 (en) 2000-06-29 2012-07-03 Touchtunes Music Corporation Method for the distribution of audio-visual information and a system for the distribution of audio-visual information
US8225369B2 (en) 1994-10-12 2012-07-17 Touchtunes Music Corporation Home digital audiovisual information recording and playback system
US8275668B2 (en) 2000-02-23 2012-09-25 Touchtunes Music Corporation Process for ordering a selection in advance, digital system and jukebox for embodiment of the process
US8332887B2 (en) 2008-01-10 2012-12-11 Touchtunes Music Corporation System and/or methods for distributing advertisements from a central advertisement network to a peripheral device via a local advertisement server
US8332895B2 (en) 2002-09-16 2012-12-11 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8428273B2 (en) 1997-09-26 2013-04-23 Touchtunes Music Corporation Wireless digital transmission system for loudspeakers
US8469820B2 (en) 2000-06-29 2013-06-25 Touchtunes Music Corporation Communication device and method between an audiovisual information playback system and an electronic game machine
US8473416B2 (en) 2002-09-16 2013-06-25 Touchtunes Music Corporation Jukebox with customizable avatar
US20130297308A1 (en) * 2012-05-07 2013-11-07 Lg Electronics Inc. Method for displaying text associated with audio file and electronic device
US8584175B2 (en) 2002-09-16 2013-11-12 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8661477B2 (en) 1994-10-12 2014-02-25 Touchtunes Music Corporation System for distributing and selecting audio and video information and method implemented by said system
US8726330B2 (en) 1999-02-22 2014-05-13 Touchtunes Music Corporation Intelligent digital audiovisual playback system
US9041784B2 (en) 2007-09-24 2015-05-26 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
CN104751846A (en) * 2015-03-20 2015-07-01 努比亚技术有限公司 Method and device for converting voice into text
US9076155B2 (en) 2009-03-18 2015-07-07 Touchtunes Music Corporation Jukebox with connection to external social networking services and associated systems and methods
US9171419B2 (en) 2007-01-17 2015-10-27 Touchtunes Music Corporation Coin operated entertainment system
US20160078865A1 (en) * 2014-09-16 2016-03-17 Lenovo (Beijing) Co., Ltd. Information Processing Method And Electronic Device
US9292166B2 (en) 2009-03-18 2016-03-22 Touchtunes Music Corporation Digital jukebox device with improved karaoke-related user interfaces, and associated methods
US9330529B2 (en) 2007-01-17 2016-05-03 Touchtunes Music Corporation Game terminal configured for interaction with jukebox device systems including same, and/or associated methods
US9521375B2 (en) 2010-01-26 2016-12-13 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US9545578B2 (en) 2000-09-15 2017-01-17 Touchtunes Music Corporation Jukebox entertainment system having multiple choice games relating to music
US9646339B2 (en) 2002-09-16 2017-05-09 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers
CN107729315A (en) * 2017-09-28 2018-02-23 努比亚技术有限公司 Display methods, terminal and the computer-readable storage medium of audio file
US9921717B2 (en) 2013-11-07 2018-03-20 Touchtunes Music Corporation Techniques for generating electronic menu graphical user interface layouts for use in connection with electronic devices
US9953481B2 (en) 2007-03-26 2018-04-24 Touchtunes Music Corporation Jukebox with associated video server
US20180152446A1 (en) * 2012-02-24 2018-05-31 Cirrus Logic International Semiconductor Ltd. System and method for speaker recognition on mobile devices
US10127759B2 (en) 1996-09-25 2018-11-13 Touchtunes Music Corporation Process for selecting a recording on a digital audiovisual reproduction system, and system for implementing the process
US10165362B2 (en) * 2015-12-24 2018-12-25 Intel Corporation Automated equalization
US10169773B2 (en) 2008-07-09 2019-01-01 Touchtunes Music Corporation Digital downloading jukebox with revenue-enhancing features
US20190005958A1 (en) * 2016-08-17 2019-01-03 Panasonic Intellectual Property Management Co., Ltd. Voice input device, translation device, voice input method, and recording medium
US20190073998A1 (en) * 2017-09-06 2019-03-07 Amazon Technologies, Inc. Voice-activated selective memory for voice-capturing devices
US10290006B2 (en) 2008-08-15 2019-05-14 Touchtunes Music Corporation Digital signage and gaming services to comply with federal and state alcohol and beverage laws and regulations
US10318027B2 (en) 2009-03-18 2019-06-11 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US10373420B2 (en) 2002-09-16 2019-08-06 Touchtunes Music Corporation Digital downloading jukebox with enhanced communication features
US10460719B1 (en) * 2013-01-11 2019-10-29 Amazon Technologies, Inc. User feedback for speech interactions
US10564804B2 (en) 2009-03-18 2020-02-18 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US10656739B2 (en) 2014-03-25 2020-05-19 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
RU2731334C1 (en) * 2019-03-25 2020-09-01 Общество С Ограниченной Ответственностью «Яндекс» Method and system for generating text representation of user's speech fragment
US11029823B2 (en) 2002-09-16 2021-06-08 Touchtunes Music Corporation Jukebox with customizable avatar
US11151224B2 (en) 2012-01-09 2021-10-19 Touchtunes Music Corporation Systems and/or methods for monitoring audio inputs to jukebox devices
US20220130409A1 (en) * 2020-10-26 2022-04-28 RINGR, Inc. Systems and methods for multi-party media management

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100322003B1 (en) * 1998-12-30 2002-05-13 윤종용 Voice memo playback method of portable wireless telephone
NZ516956A (en) * 1999-07-28 2004-11-26 Custom Speech Usa Inc System and method for improving the accuracy of a speech recognition program
US8868769B2 (en) * 2000-03-14 2014-10-21 Noah Prywes System and method for obtaining responses to tasks
US7369997B2 (en) * 2001-08-01 2008-05-06 Microsoft Corporation Controlling speech recognition functionality in a computing device
US7036080B1 (en) * 2001-11-30 2006-04-25 Sap Labs, Inc. Method and apparatus for implementing a speech interface for a GUI
US20090216534A1 (en) * 2008-02-22 2009-08-27 Prakash Somasundaram Voice-activated emergency medical services communication and documentation system
US8255225B2 (en) 2008-08-07 2012-08-28 Vocollect Healthcare Systems, Inc. Voice assistant system
CN107111961A (en) * 2014-12-31 2017-08-29 诺瓦交谈有限责任公司 The method and system treated for online and long-range disfluency
CN106356062A (en) * 2015-07-17 2017-01-25 深圳前海智云谷科技有限公司 Machine intelligent recognition and manual service combined voice recognition method and system
CN105869654B (en) * 2016-03-29 2020-12-04 阿里巴巴集团控股有限公司 Audio message processing method and device
CN106601254B (en) 2016-12-08 2020-11-06 阿里巴巴(中国)有限公司 Information input method and device and computing equipment
US11062707B2 (en) 2018-06-28 2021-07-13 Hill-Rom Services, Inc. Voice recognition for patient care environment
US11881219B2 (en) 2020-09-28 2024-01-23 Hill-Rom Services, Inc. Voice control in a healthcare facility

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU1040512A1 (en) 1978-07-05 1983-09-07 Предприятие П/Я В-2672 Method and device for locating word boundaries
JPS59112710A (en) 1982-12-20 1984-06-29 Fujitsu Ltd Voice input level control system
GB8613327D0 (en) * 1986-06-02 1986-07-09 British Telecomm Speech processor
US4829576A (en) * 1986-10-21 1989-05-09 Dragon Systems, Inc. Voice recognition system
JPS63299555A (en) 1987-05-29 1988-12-07 Toshiba Corp Radio telephone system
JPH04347898A (en) 1991-05-24 1992-12-03 Nippon Telegr & Teleph Corp <Ntt> Voice recognizing method
US5199077A (en) 1991-09-19 1993-03-30 Xerox Corporation Wordspotting for voice editing and indexing
JPH05231922A (en) 1992-02-24 1993-09-07 Nippon Telegr & Teleph Corp <Ntt> Display device of sound pressure level for sound recognizing apparatus
JPH075893A (en) 1993-06-16 1995-01-10 Sony Corp Voice recognition device
JP2986345B2 (en) * 1993-10-18 1999-12-06 インターナショナル・ビジネス・マシーンズ・コーポレイション Voice recording indexing apparatus and method
JP3484757B2 (en) * 1994-05-13 2004-01-06 ソニー株式会社 Noise reduction method and noise section detection method for voice signal
DE4434255A1 (en) * 1994-09-24 1996-03-28 Sel Alcatel Ag Device for voice recording with subsequent text creation
JP3176236B2 (en) 1994-11-30 2001-06-11 株式会社ソニー・コンピュータエンタテインメント Signal reproducing apparatus and signal reproducing method
US5960447A (en) * 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US5893900A (en) * 1996-03-29 1999-04-13 Intel Corporation Method and apparatus for indexing an analog audio recording and editing a digital version of the indexed audio recording
GB2303955B (en) * 1996-09-24 1997-05-14 Allvoice Computing Plc Data processing method and apparatus
US5875448A (en) * 1996-10-08 1999-02-23 Boys; Donald R. Data stream editing system including a hand-held voice-editing apparatus having a position-finding enunciator
US6064965A (en) * 1998-09-02 2000-05-16 International Business Machines Corporation Combined audio playback in speech recognition proofreader

Cited By (190)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8249959B2 (en) 1994-10-12 2012-08-21 Touchtunes Music Corporation Communications techniques for an intelligent digital audiovisual reproduction system
US7987282B2 (en) 1994-10-12 2011-07-26 Touchtunes Music Corporation Audiovisual distribution system for playing an audiovisual piece among a plurality of audiovisual devices connected to a central server through a network
US8145547B2 (en) 1994-10-12 2012-03-27 Touchtunes Music Corporation Method of communications for an intelligent digital audiovisual playback system
US8438085B2 (en) 1994-10-12 2013-05-07 Touchtunes Music Corporation Communications techniques for an intelligent digital audiovisual reproduction system
US8724436B2 (en) 1994-10-12 2014-05-13 Touchtunes Music Corporation Audiovisual distribution system for playing an audiovisual piece among a plurality of audiovisual devices connected to a central server through a network
US8621350B2 (en) 1994-10-12 2013-12-31 Touchtunes Music Corporation Pay-per-play audiovisual system with touch screen interface
US8781926B2 (en) 1994-10-12 2014-07-15 Touchtunes Music Corporation Communications techniques for an intelligent digital audiovisual reproduction system
US8225369B2 (en) 1994-10-12 2012-07-17 Touchtunes Music Corporation Home digital audiovisual information recording and playback system
US8661477B2 (en) 1994-10-12 2014-02-25 Touchtunes Music Corporation System for distributing and selecting audio and video information and method implemented by said system
US8184508B2 (en) 1994-10-12 2012-05-22 Touchtunes Music Corporation Intelligent digital audiovisual reproduction system
US8037412B2 (en) 1994-10-12 2011-10-11 Touchtunes Music Corporation Pay-per-play audiovisual system with touch screen interface
US8593925B2 (en) 1994-10-12 2013-11-26 Touchtunes Music Corporation Intelligent digital audiovisual reproduction system
US10127759B2 (en) 1996-09-25 2018-11-13 Touchtunes Music Corporation Process for selecting a recording on a digital audiovisual reproduction system, and system for implementing the process
US8428273B2 (en) 1997-09-26 2013-04-23 Touchtunes Music Corporation Wireless digital transmission system for loudspeakers
US9313574B2 (en) 1997-09-26 2016-04-12 Touchtunes Music Corporation Wireless digital transmission system for loudspeakers
US8032879B2 (en) 1998-07-21 2011-10-04 Touchtunes Music Corporation System for remote loading of objects or files in order to update software
US8074253B1 (en) 1998-07-22 2011-12-06 Touchtunes Music Corporation Audiovisual reproduction system
US8677424B2 (en) 1998-07-22 2014-03-18 Touchtunes Music Corporation Remote control unit for intelligent digital audiovisual reproduction systems
US8189819B2 (en) 1998-07-22 2012-05-29 Touchtunes Music Corporation Sound control circuit for a digital audiovisual reproduction system
US8683541B2 (en) 1998-07-22 2014-03-25 Touchtunes Music Corporation Audiovisual reproduction system
US8127324B2 (en) 1998-07-22 2012-02-28 Touchtunes Music Corporation Audiovisual reproduction system
US10104410B2 (en) 1998-07-22 2018-10-16 Touchtunes Music Corporation Audiovisual reproduction system
US8843991B2 (en) 1998-07-22 2014-09-23 Touchtunes Music Corporation Audiovisual reproduction system
US9148681B2 (en) 1998-07-22 2015-09-29 Touchtunes Music Corporation Audiovisual reproduction system
US9922547B2 (en) 1998-07-22 2018-03-20 Touchtunes Music Corporation Remote control unit for activating and deactivating means for payment and for displaying payment status
US9769566B2 (en) 1998-07-22 2017-09-19 Touchtunes Music Corporation Sound control circuit for a digital audiovisual reproduction system
US9100676B2 (en) 1998-07-22 2015-08-04 Touchtunes Music Corporation Audiovisual reproduction system
US8904449B2 (en) 1998-07-22 2014-12-02 Touchtunes Music Corporation Remote control unit for activating and deactivating means for payment and for displaying payment status
US8726330B2 (en) 1999-02-22 2014-05-13 Touchtunes Music Corporation Intelligent digital audiovisual playback system
US8931020B2 (en) 1999-07-16 2015-01-06 Touchtunes Music Corporation Remote management system for at least one audiovisual information reproduction device
US8479240B2 (en) 1999-07-16 2013-07-02 Touchtunes Music Corporation Remote management system for at least one audiovisual information reproduction device
US7996873B1 (en) 1999-07-16 2011-08-09 Touchtunes Music Corporation Remote management system for at least one audiovisual information reproduction device
US9288529B2 (en) 1999-07-16 2016-03-15 Touchtunes Music Corporation Remote management system for at least one audiovisual information reproduction device
US8028318B2 (en) 1999-07-21 2011-09-27 Touchtunes Music Corporation Remote control unit for activating and deactivating means for payment and for displaying payment status
US10846770B2 (en) 2000-02-03 2020-11-24 Touchtunes Music Corporation Process for ordering a selection in advance, digital system and jukebox for embodiment of the process
US8495109B2 (en) 2000-02-16 2013-07-23 Touch Tunes Music Corporation Downloading file reception process
US9608583B2 (en) 2000-02-16 2017-03-28 Touchtunes Music Corporation Process for adjusting the sound volume of a digital sound recording
US7107109B1 (en) * 2000-02-16 2006-09-12 Touchtunes Music Corporation Process for adjusting the sound volume of a digital sound recording
US7992178B1 (en) 2000-02-16 2011-08-02 Touchtunes Music Corporation Downloading file reception process
US9451203B2 (en) 2000-02-16 2016-09-20 Touchtunes Music Corporation Downloading file reception process
US8165318B2 (en) 2000-02-16 2012-04-24 Touchtunes Music Corporation Process for adjusting the sound volume of a digital sound recording
US8873772B2 (en) 2000-02-16 2014-10-28 Touchtunes Music Corporation Process for adjusting the sound volume of a digital sound recording
US10068279B2 (en) 2000-02-23 2018-09-04 Touchtunes Music Corporation Process for ordering a selection in advance, digital system and jukebox for embodiment of the process
US8275668B2 (en) 2000-02-23 2012-09-25 Touchtunes Music Corporation Process for ordering a selection in advance, digital system and jukebox for embodiment of the process
US9129328B2 (en) 2000-02-23 2015-09-08 Touchtunes Music Corporation Process for ordering a selection in advance, digital system and jukebox for embodiment of the process
US9536257B2 (en) 2000-05-10 2017-01-03 Touchtunes Music Corporation Device and process for remote management of a network of audiovisual information reproduction systems
US9152633B2 (en) 2000-05-10 2015-10-06 Touchtunes Music Corporation Device and process for remote management of a network of audiovisual information reproduction systems
US8655922B2 (en) 2000-05-10 2014-02-18 Touch Tunes Music Corporation Device and process for remote management of a network of audiovisual information reproduction systems
US8275807B2 (en) 2000-05-10 2012-09-25 Touchtunes Music Corporation Device and process for remote management of a network of audiovisual information reproduction systems
US7996438B2 (en) 2000-05-10 2011-08-09 Touchtunes Music Corporation Device and process for remote management of a network of audiovisual information reproduction systems
US10007687B2 (en) 2000-05-10 2018-06-26 Touchtunes Music Corporation Device and process for remote management of a network of audiovisual information reproductions systems
US9197914B2 (en) 2000-06-20 2015-11-24 Touchtunes Music Corporation Method for the distribution of audio-visual information and a system for the distribution of audio-visual information
US8469820B2 (en) 2000-06-29 2013-06-25 Touchtunes Music Corporation Communication device and method between an audiovisual information playback system and an electronic game machine
US8214874B2 (en) 2000-06-29 2012-07-03 Touchtunes Music Corporation Method for the distribution of audio-visual information and a system for the distribution of audio-visual information
US9149727B2 (en) 2000-06-29 2015-10-06 Touchtunes Music Corporation Communication device and method between an audiovisual information playback system and an electronic game machine
US9591340B2 (en) 2000-06-29 2017-03-07 Touchtunes Music Corporation Method for the distribution of audio-visual information and a system for the distribution of audio-visual information
US9292999B2 (en) 2000-06-29 2016-03-22 Touchtunes Music Corporation Communication device and method between an audiovisual information playback system and an electronic game machine
US8522303B2 (en) 2000-06-29 2013-08-27 Touchtunes Music Corporation Method for the distribution of audio-visual information and a system for the distribution of audio-visual information
US8863161B2 (en) 2000-06-29 2014-10-14 Touchtunes Music Corporation Method for the distribution of audio-visual information and a system for the distribution of audio-visual information
US8840479B2 (en) 2000-06-29 2014-09-23 Touchtunes Music Corporation Communication device and method between an audiovisual information playback system and an electronic game machine
US9539515B2 (en) 2000-06-29 2017-01-10 Touchtunes Music Corporation Communication device and method between an audiovisual information playback system and an electronic game machine
US9545578B2 (en) 2000-09-15 2017-01-17 Touchtunes Music Corporation Jukebox entertainment system having multiple choice games relating to music
US20040044428A1 (en) * 2002-08-29 2004-03-04 Hiroaki Yoshino Audio processor, audio processing method, computer program, and computer readable storage medium
US7778429B2 (en) * 2002-08-29 2010-08-17 Canon Kabushiki Kaisha Audio processor, audio processing method, computer program, and computer readable storage medium
US11847882B2 (en) 2002-09-16 2023-12-19 Touchtunes Music Company, Llc Digital downloading jukebox with enhanced communication features
US8918485B2 (en) 2002-09-16 2014-12-23 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US9165322B2 (en) 2002-09-16 2015-10-20 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US10089613B2 (en) 2002-09-16 2018-10-02 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers
US11314390B2 (en) 2002-09-16 2022-04-26 Touchtunes Music Corporation Jukebox with customizable avatar
US9202209B2 (en) 2002-09-16 2015-12-01 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8103589B2 (en) 2002-09-16 2012-01-24 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers
US11468418B2 (en) 2002-09-16 2022-10-11 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers
US11049083B2 (en) 2002-09-16 2021-06-29 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers and payment-triggered game devices update capability
US11567641B2 (en) 2002-09-16 2023-01-31 Touchtunes Music Company, Llc Jukebox with customizable avatar
US11029823B2 (en) 2002-09-16 2021-06-08 Touchtunes Music Corporation Jukebox with customizable avatar
US9015286B2 (en) 2002-09-16 2015-04-21 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8151304B2 (en) 2002-09-16 2012-04-03 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US10372301B2 (en) 2002-09-16 2019-08-06 Touch Tunes Music Corporation Jukebox with customizable avatar
US9430797B2 (en) 2002-09-16 2016-08-30 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US9436356B2 (en) 2002-09-16 2016-09-06 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US9015287B2 (en) 2002-09-16 2015-04-21 Touch Tunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US9513774B2 (en) 2002-09-16 2016-12-06 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8930504B2 (en) 2002-09-16 2015-01-06 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US9164661B2 (en) 2002-09-16 2015-10-20 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8751611B2 (en) 2002-09-16 2014-06-10 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US11663569B2 (en) 2002-09-16 2023-05-30 Touchtunes Music Company, Llc Digital downloading jukebox system with central and local music server
US8719873B2 (en) 2002-09-16 2014-05-06 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US8584175B2 (en) 2002-09-16 2013-11-12 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US9646339B2 (en) 2002-09-16 2017-05-09 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers
US10373420B2 (en) 2002-09-16 2019-08-06 Touchtunes Music Corporation Digital downloading jukebox with enhanced communication features
US10783738B2 (en) 2002-09-16 2020-09-22 Touchtunes Music Corporation Digital downloading jukebox with enhanced communication features
US10452237B2 (en) 2002-09-16 2019-10-22 Touchtunes Music Corporation Jukebox with customizable avatar
US10373142B2 (en) 2002-09-16 2019-08-06 Touchtunes Music Corporation Digital downloading jukebox system with central and local music servers
US8473416B2 (en) 2002-09-16 2013-06-25 Touchtunes Music Corporation Jukebox with customizable avatar
US8332895B2 (en) 2002-09-16 2012-12-11 Touchtunes Music Corporation Digital downloading jukebox system with user-tailored music management, communications, and other tools
US9271074B2 (en) * 2005-09-02 2016-02-23 Lsvt Global, Inc. System and method for measuring sound
US20070208558A1 (en) * 2005-09-02 2007-09-06 De Matos Carlos E C System and Method for Measuring Sound
US11756380B2 (en) 2007-01-17 2023-09-12 Touchtunes Music Company, Llc Coin operated entertainment system
US9330529B2 (en) 2007-01-17 2016-05-03 Touchtunes Music Corporation Game terminal configured for interaction with jukebox device systems including same, and/or associated methods
US10249139B2 (en) 2007-01-17 2019-04-02 Touchtunes Music Corporation Coin operated entertainment system
US10970963B2 (en) 2007-01-17 2021-04-06 Touchtunes Music Corporation Coin operated entertainment system
US9171419B2 (en) 2007-01-17 2015-10-27 Touchtunes Music Corporation Coin operated entertainment system
US9953481B2 (en) 2007-03-26 2018-04-24 Touchtunes Music Corporation Jukebox with associated video server
US9324064B2 (en) 2007-09-24 2016-04-26 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10032149B2 (en) 2007-09-24 2018-07-24 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US9990615B2 (en) 2007-09-24 2018-06-05 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10613819B2 (en) 2007-09-24 2020-04-07 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US9041784B2 (en) 2007-09-24 2015-05-26 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10228897B2 (en) 2007-09-24 2019-03-12 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US10057613B2 (en) 2007-09-24 2018-08-21 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US11501333B2 (en) 2008-01-10 2022-11-15 Touchtunes Music Corporation Systems and/or methods for distributing advertisements from a central advertisement network to a peripheral device via a local advertisement server
US8332887B2 (en) 2008-01-10 2012-12-11 Touchtunes Music Corporation System and/or methods for distributing advertisements from a central advertisement network to a peripheral device via a local advertisement server
US8739206B2 (en) 2008-01-10 2014-05-27 Touchtunes Music Corporation Systems and/or methods for distributing advertisements from a central advertisement network to a peripheral device via a local advertisement server
US9953341B2 (en) 2008-01-10 2018-04-24 Touchtunes Music Corporation Systems and/or methods for distributing advertisements from a central advertisement network to a peripheral device via a local advertisement server
US11144946B2 (en) 2008-07-09 2021-10-12 Touchtunes Music Corporation Digital downloading jukebox with revenue-enhancing features
US10169773B2 (en) 2008-07-09 2019-01-01 Touchtunes Music Corporation Digital downloading jukebox with revenue-enhancing features
US10290006B2 (en) 2008-08-15 2019-05-14 Touchtunes Music Corporation Digital signage and gaming services to comply with federal and state alcohol and beverage laws and regulations
US11074593B2 (en) 2008-08-15 2021-07-27 Touchtunes Music Corporation Digital signage and gaming services to comply with federal and state alcohol and beverage laws and regulations
US11645662B2 (en) 2008-08-15 2023-05-09 Touchtunes Music Company, Llc Digital signage and gaming services to comply with federal and state alcohol and beverage laws and regulations
US9774906B2 (en) 2009-03-18 2017-09-26 Touchtunes Music Corporation Entertainment server and associated social networking services
US11775146B2 (en) 2009-03-18 2023-10-03 Touchtunes Music Company, Llc Digital jukebox device with improved karaoke-related user interfaces, and associated methods
US10423250B2 (en) 2009-03-18 2019-09-24 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US10963132B2 (en) 2009-03-18 2021-03-30 Touchtunes Music Corporation Digital jukebox device with improved karaoke-related user interfaces, and associated methods
US10564804B2 (en) 2009-03-18 2020-02-18 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US10579329B2 (en) 2009-03-18 2020-03-03 Touchtunes Music Corporation Entertainment server and associated social networking services
US10977295B2 (en) 2009-03-18 2021-04-13 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11537270B2 (en) 2009-03-18 2022-12-27 Touchtunes Music Company, Llc Digital jukebox device with improved karaoke-related user interfaces, and associated methods
US9959012B2 (en) 2009-03-18 2018-05-01 Touchtunes Music Corporation Digital jukebox device with improved karaoke-related user interfaces, and associated methods
US11520559B2 (en) 2009-03-18 2022-12-06 Touchtunes Music Company, Llc Entertainment server and associated social networking services
US10318027B2 (en) 2009-03-18 2019-06-11 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US10228900B2 (en) 2009-03-18 2019-03-12 Touchtunes Music Corporation Entertainment server and associated social networking services
US10719149B2 (en) 2009-03-18 2020-07-21 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US9076155B2 (en) 2009-03-18 2015-07-07 Touchtunes Music Corporation Jukebox with connection to external social networking services and associated systems and methods
US9292166B2 (en) 2009-03-18 2016-03-22 Touchtunes Music Corporation Digital jukebox device with improved karaoke-related user interfaces, and associated methods
US11093211B2 (en) 2009-03-18 2021-08-17 Touchtunes Music Corporation Entertainment server and associated social networking services
US10782853B2 (en) 2009-03-18 2020-09-22 Touchtunes Music Corporation Digital jukebox device with improved karaoke-related user interfaces, and associated methods
US10789285B2 (en) 2009-03-18 2020-09-29 Touchtones Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11700680B2 (en) 2010-01-26 2023-07-11 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US11259376B2 (en) 2010-01-26 2022-02-22 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11252797B2 (en) 2010-01-26 2022-02-15 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US9521375B2 (en) 2010-01-26 2016-12-13 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11576239B2 (en) 2010-01-26 2023-02-07 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US10768891B2 (en) 2010-01-26 2020-09-08 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11864285B2 (en) 2010-01-26 2024-01-02 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US10901686B2 (en) 2010-01-26 2021-01-26 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11291091B2 (en) 2010-01-26 2022-03-29 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11570862B2 (en) 2010-01-26 2023-01-31 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US10503463B2 (en) 2010-01-26 2019-12-10 TouchTune Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11395023B2 (en) 2011-09-18 2022-07-19 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10582240B2 (en) 2011-09-18 2020-03-03 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10225593B2 (en) 2011-09-18 2019-03-05 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US11368733B2 (en) 2011-09-18 2022-06-21 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10582239B2 (en) 2011-09-18 2020-03-03 TouchTune Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10880591B2 (en) 2011-09-18 2020-12-29 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US10848807B2 (en) 2011-09-18 2020-11-24 Touchtunes Music Corporation Digital jukebox device with karaoke and/or photo booth features, and associated methods
US11151224B2 (en) 2012-01-09 2021-10-19 Touchtunes Music Corporation Systems and/or methods for monitoring audio inputs to jukebox devices
US10749864B2 (en) * 2012-02-24 2020-08-18 Cirrus Logic, Inc. System and method for speaker recognition on mobile devices
US20180152446A1 (en) * 2012-02-24 2018-05-31 Cirrus Logic International Semiconductor Ltd. System and method for speaker recognition on mobile devices
US11545155B2 (en) 2012-02-24 2023-01-03 Cirrus Logic, Inc. System and method for speaker recognition on mobile devices
US20130297308A1 (en) * 2012-05-07 2013-11-07 Lg Electronics Inc. Method for displaying text associated with audio file and electronic device
US10460719B1 (en) * 2013-01-11 2019-10-29 Amazon Technologies, Inc. User feedback for speech interactions
US10950220B1 (en) * 2013-01-11 2021-03-16 Amazon Technologies, Inc. User feedback for speech interactions
US9921717B2 (en) 2013-11-07 2018-03-20 Touchtunes Music Corporation Techniques for generating electronic menu graphical user interface layouts for use in connection with electronic devices
US11714528B2 (en) 2013-11-07 2023-08-01 Touchtunes Music Company, Llc Techniques for generating electronic menu graphical user interface layouts for use in connection with electronic devices
US11409413B2 (en) 2013-11-07 2022-08-09 Touchtunes Music Corporation Techniques for generating electronic menu graphical user interface layouts for use in connection with electronic devices
US10656739B2 (en) 2014-03-25 2020-05-19 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11625113B2 (en) 2014-03-25 2023-04-11 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US11353973B2 (en) 2014-03-25 2022-06-07 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11874980B2 (en) 2014-03-25 2024-01-16 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US10949006B2 (en) 2014-03-25 2021-03-16 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11513619B2 (en) 2014-03-25 2022-11-29 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US11782538B2 (en) 2014-03-25 2023-10-10 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US11327588B2 (en) 2014-03-25 2022-05-10 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US10901540B2 (en) 2014-03-25 2021-01-26 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US11556192B2 (en) 2014-03-25 2023-01-17 Touchtunes Music Company, Llc Digital jukebox device with improved user interfaces, and associated methods
US11137844B2 (en) 2014-03-25 2021-10-05 Touchtunes Music Corporation Digital jukebox device with improved user interfaces, and associated methods
US20160078865A1 (en) * 2014-09-16 2016-03-17 Lenovo (Beijing) Co., Ltd. Information Processing Method And Electronic Device
US10699712B2 (en) * 2014-09-16 2020-06-30 Lenovo (Beijing) Co., Ltd. Processing method and electronic device for determining logic boundaries between speech information using information input in a different collection manner
CN104751846A (en) * 2015-03-20 2015-07-01 努比亚技术有限公司 Method and device for converting voice into text
US10165362B2 (en) * 2015-12-24 2018-12-25 Intel Corporation Automated equalization
US10854200B2 (en) * 2016-08-17 2020-12-01 Panasonic Intellectual Property Management Co., Ltd. Voice input device, translation device, voice input method, and recording medium
US20190005958A1 (en) * 2016-08-17 2019-01-03 Panasonic Intellectual Property Management Co., Ltd. Voice input device, translation device, voice input method, and recording medium
US20190073998A1 (en) * 2017-09-06 2019-03-07 Amazon Technologies, Inc. Voice-activated selective memory for voice-capturing devices
US10796687B2 (en) * 2017-09-06 2020-10-06 Amazon Technologies, Inc. Voice-activated selective memory for voice-capturing devices
CN111052230A (en) * 2017-09-06 2020-04-21 亚马逊科技公司 Selective memory for voice activation of a voice capture device
US11682382B2 (en) * 2017-09-06 2023-06-20 Amazon Technologies, Inc. Voice-activated selective memory for voice-capturing devices
CN107729315A (en) * 2017-09-28 2018-02-23 努比亚技术有限公司 Display methods, terminal and the computer-readable storage medium of audio file
US11043215B2 (en) 2019-03-25 2021-06-22 Yandex Europe Ag Method and system for generating textual representation of user spoken utterance
RU2731334C1 (en) * 2019-03-25 2020-09-01 Общество С Ограниченной Ответственностью «Яндекс» Method and system for generating text representation of user's speech fragment
US20220130409A1 (en) * 2020-10-26 2022-04-28 RINGR, Inc. Systems and methods for multi-party media management

Also Published As

Publication number Publication date
DE69829802D1 (en) 2005-05-25
EP0887788A2 (en) 1998-12-30
DE69829802T2 (en) 2006-03-02
US6353809B2 (en) 2002-03-05
EP0887788A3 (en) 1999-12-15
EP0887788B1 (en) 2005-04-20

Similar Documents

Publication Publication Date Title
US6353809B2 (en) Speech recognition with text generation from portions of voice data preselected by manual-input commands
US6792409B2 (en) Synchronous reproduction in a speech recognition system
US5526407A (en) Method and apparatus for managing information
US9100742B2 (en) USB dictation device
JP3610083B2 (en) Multimedia presentation apparatus and method
JP4558308B2 (en) Voice recognition system, data processing apparatus, data processing method thereof, and program
US6133904A (en) Image manipulation
JP4510953B2 (en) Non-interactive enrollment in speech recognition
RU2223554C2 (en) Speech recognition device
JPH08194500A (en) Apparatus and method for recording of speech for later generation of text
US20030072013A1 (en) Document creation through embedded speech recognition
US6212499B1 (en) Audible language recognition by successive vocabulary reduction
JPH11212590A (en) Voice processor, recording medium with voice recognition program recorded, and recording medium with processing program recorded
JP2006323857A (en) Voice recognition processor, and recording medium recorded with voice recognition processing program
JP2723214B2 (en) Voice document creation device
JP3905181B2 (en) Voice recognition processing device and recording medium recording voice recognition processing program
KR102274275B1 (en) Application and method for generating text link
JP2000075893A (en) Voice recognition device
JP2000089784A (en) Voice recognition system
JP2835320B2 (en) Voice document creation device
JP2555029B2 (en) Voice recognition device
CN116564286A (en) Voice input method and device, storage medium and electronic equipment
JP2000259713A (en) Typical document preparing system based on speech recognition of conversation
JPH09106339A (en) Information processor and data storing method
JPH11184887A (en) Device for storing and retrieving digital information

Legal Events

Date Code Title Description
AS Assignment

Owner name: OLYMPUS OPTICAL CO., LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TAKAHASHI, HIDETAKE;ONISHI, TAKAFUMI;REEL/FRAME:009226/0225

Effective date: 19980522

AS Assignment

Owner name: OLYMPUS OPTICAL CO., LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TAKAHASHI, HIDETAKA;ONISHI, TAKAFUMI;REEL/FRAME:009359/0567

Effective date: 19980522

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: OLYMPUS CORPORATION, JAPAN

Free format text: CHANGE OF NAME;ASSIGNOR:OLYMPUS OPTICAL CO., LTD;REEL/FRAME:014426/0597

Effective date: 20031001

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: OLYMPUS CORPORATION, JAPAN

Free format text: CHANGE OF ADDRESS;ASSIGNOR:OLYMPUS CORPORATION;REEL/FRAME:039344/0502

Effective date: 20160401