WO2001013215A1

WO2001013215A1 - Device for converting spoken commands and/or spoken texts into keyboard and/or mouse movements and/or texts

Info

Publication number: WO2001013215A1
Application number: PCT/DE2000/002683
Authority: WO
Inventors: Christoph Bueltemann; Heribert Leissner; Tilo Schlumberger; Detlef ZÜNDORF
Original assignee: Genologic Gmbh
Priority date: 1999-08-13
Filing date: 2000-08-08
Publication date: 2001-02-22
Also published as: AU7769400A; DE10082416D2

Abstract

The invention relates to a device for converting spoken commands and/or spoken texts into keyboard and/or mouse movements and/or texts. The aim of the invention is to create a device of this type which ensures a reliable automatic conversion of speech into keyboard commands, mouse movements and/or text, which functions in an efficient and robust manner also in the instance of interferences caused by background noises, and which makes it possible to reliably identify the speaker. To this end, a computer unit (1) comprising a speech recognition unit (2) converts spoken commands or spoken texts input via a microphone (3) into keyboard or mouse commands and/or texts using automatic speech recognition and speaker identification, and transfers them to a computer in the form of digital values via the USB interface (4) or another bi-directional interface (5).

Description

Device for converting voice commands and / or speech texts into keyboard and / or mouse movements and / or texts

The invention relates to a device for converting voice commands and / or language texts into keyboard and / or mouse movements and / or texts.

It is known that either a keyboard, a touch screen and / or a computer mouse is used to operate computer systems.

Keyboards as control elements for computer systems have been known since the first days of the PC and its predecessors, around 1980. Computer mice have been used since the first graphical user interface introduced by Apple Inc. in 1986.

The mouse movements are generated by movements of the hand, and a menu item or program command is triggered by pressing the mouse button with the index finger.

Automatic speech recognition can be used both to convert spoken language into keyboard commands and / or mouse movements and texts, and to verify the identity of a user.

Various research projects on speech recognition techniques have been underway since around 1950. Since 1980, the recognition options have been significantly improved through the development of statistical methods such as the Hiddden-Markow model (HMM). From the literature (Schukat-Talamazzini, EG (1995), Automatic Speech Recognition, Fundamentals, Statistical Models and Efficient Algorithms, Vieweg Verlag, Braunschweig) it is already known that speech recognition methods are either based on the comparison between stored reference patterns and the unknown utterance or on the description of individual words of the vocabulary using stochastic models. In this case, an utterance consisting of digital samples is first broken down into a sequence of speech blocks of a predetermined duration, and then a set of feature sizes is calculated for each speech block. Each sentence results in a so-called feature vector. The statistical properties of the feature sizes are recorded in the model-based approach by means of distribution density functions with corresponding mean values and variances. These mean values and variances must first be determined in a training phase on the basis of a large number of representative training statements in order to obtain a reference sentence (a model). To recognize an unknown utterance, probabilities are then calculated for the models that represent the words of the vocabulary.

These statistical methods were expanded by the inclusion and combination with methods of neural network technologies.

All of these methods and devices have in common that they have to run directly on a host PC, thus place a heavy load on them and complicate handling by the fact that there are often long waiting times until the recognition process is completed. They require huge resources of CPU power and memory and are unreliable when it comes to speaker identification.

From US Pat. No. 5,659,665 it is known that predefined voice commands are converted into keystroke data and looped into the keyboard interface. The control units for computers and terminals currently available on the market are based on mechanical input devices, in which either a key is pressed or a mouse is moved. This has the disadvantage that this type of operation always has to be learned, and the handling of computers or terminals must be learned, and the handling of computers remains closed to large parts of the population.

Furthermore, in contrast to automatic voice input and output (according to the present invention), these control units, which are implemented by switches, buttons, keyboards or mice, are considerably more error-prone, more prone to failure and more complex (in terms of handling) with regard to data input or output. In addition, such systems always require the appropriate skills and knowledge with regard to their functionality and operation (e.g. with the keyboard), which often leads to increased expenditure of time and thus increased costs. Often it is also a hindrance in the actual work process to detach the eyes and hands from the object to be processed and / or a document in order to be able to make the entries with the mechanical aid.

The object of the present invention is to provide a device which ensures reliable automatic conversion of speech into keyboard commands, mouse movements and / or text, works efficiently and robustly even in the event of disturbances from background noise and enables the speaker to be reliably identified ,

To solve this problem, it is proposed that, by means of automatic speech recognition and speaker identification, a computer unit with a speech recognition unit converts speech commands or speech texts into keyboard or mouse commands and / or texts by means of automatic speech recognition and speaker identification Form of digital values can be transferred to a computer via the USB interface or another bidirectional interface.

Any necessary conversion of the transferred data can be carried out using driver software that runs on the computer or terminal.

Using the microphone / speaker combination connected to this unit, commands or data can be returned to the user via voice output.

The above statements are explained in more detail with reference to the following drawings. Show

Fig. 1 shows a computer unit with a speech recognition unit, a

Microphone, a USB interface or another bidirectional interface in a top view, FIG. 2 shows a computer unit with additional USB connections and USB distributor in a top view, FIG. 3 shows a computer unit with PCMCIA (Personal Computer Memory Card

Interface Association) slot in top view, Fig.4 a computer unit with a speech recognition unit, a

Speaker identification, a speech generation unit, consisting of a clock generator, a CPU (Central Processor Unit), a command memory and / or data memory, a microphone, a loudspeaker and an analog input and output circuit in plan view.

1 describes a computer unit (1) with a speech recognition unit (2). This speech recognition unit (2) is used to convert voice commands or language texts using a microphone (3) in keyboard or mouse commands and / or texts. The converted digital values are then transferred to a computer or terminal via the USB interface (4) or any other bidirectional interface (5) such as a serial RS232 interface.

2 shows the computer unit (1) with further USB connections (6). With the aid of this device, it is possible to combine the converted voice commands or voice data with other data from the peripheral devices and to forward them via the USB distributor (7). It therefore z. B. spoken the number of a delivery of goods and their product number are scanned. The illustrated computing unit (1) combines the data in accordance with the specifications of the host computer and sends it as a coherent key sequence.

Fig. 3 describes a computer unit (1) with a PCMCIA (Personal Computer Memory Card Interface Association) slot (8). This enables the expansion of the computer unit (1) with a wide variety of PCMCIA cards, such as Ethernet or radio network cards.

Fig. 4 shows a computer unit (1) with a speech recognition unit (2), a speaker identification unit (15) and a speech generation unit (9), these consisting of a clock generator (10), a CPU (Central Processor Unit) (1 1), one Command memory and / or data memory (12), a microphone (3), a speaker (13) and an analog input and output circuit (14). With the aid of this device, complex data inputs can first be compiled in a dialog-oriented manner (by means of spoken dialogues by means of voice output) and then sent to the host computer or terminal as a coherent key sequence. According to the device according to the invention, a voice signal with a predetermined clock rate, for example 100 us, is digitized to convert the voice commands or language texts into keyboard or mouse commands and / or texts by means of automatic voice recognition and speaker identification. The speech signal is changed and / or transformed, and / or upstream algorithms for feature extraction (such as digital filters) are used. The GP's (genetic programs) are additionally and / or exclusively supplied with this signal. The digital signal can be changed and / or transformed in that the phoneme and / or word identification takes place on the basis of neural networks (NN) and the classification result is fed to an NN in the form of digital values. The phoneme or word identification can also be based on fuzzy logic (FL). The classification result is then fed to an FL function in the form of digital values. The classification result of GPs (genetic programs) from the speech signal is used to identify the speaker.

The following examples are explained:

example 1

The control of a computer mouse and the navigation on the surface of a computer operating system can be carried out by voice control based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic. This makes it possible to create a computer mouse in which the user alternatively enters the operating system commands directly by voice, opens menus, starts programs, or issues control commands without first moving the mouse pointer to the corresponding position and clicking. Example 2

The data input and output in the area of logistics can be combined with the voice commands entered via the microphone (3) with the data from other peripheral devices and then transferred as a data stream via the USB interface (4) to a higher-level computer. For example, in an order-picking process, if an article with a barcode is still provided with a quantity, the user can enter the data in any order. The article number is recorded via the scanner connected to the USB interface (4) and the user speaks the quantity picked either before or after. The system can distinguish between the two types of input and only passes the complete data record a) on when all data is available, b) in a predefined form, e.g. first the article number and then the quantity, further.

Example 3

In the storage area, the transfer of the recorded voice data via the PCMCIA (Personal Computer Memory Card Interface Association) slot (8) and a card inserted therein, e.g. B. a radio modem.

Example 4

According to the device according to the invention, it is possible for the speech recognition unit (2) to emulate the keyboard driver. A software runs in the background of the operating system and checks at short intervals (<50msec) whether data arrives at the USB interface (4). These are then implemented in the same system commands that a keyboard driver generates, and via the API (Application Programming Interface) of the operating system either to the currently active foreground application or to a predefined application. This means that no changes need to be made within the target application, since it already responds to keystrokes.

Another additional function of this software is the implementation of a spoken word such as "Open file" in so-called shortcuts. These shortcuts allow the direct activation of a function using keyboard shortcuts. In the example above, this is "Ctrl + O". This is done using tables and matrices that are created before use.

Example 5

According to the device according to the invention, it is also possible for the computer unit (1) to convert voice commands into keyboard and / or mouse commands by the speech recognition unit (2), which enables voice-controlled operation of a web browser. It is operated in such a way that the spoken word is converted into shortcuts. These shortcuts allow direct activation of a browser function using keyboard shortcuts. This is done using tables and matrices that are created before use.

Example 6

According to the device according to the invention, the computer unit (1) can use the speech recognition unit (2) to convert voice commands which enable the voice-controlled operation of an e-mail program. This eliminates the need to use the mouse and write on the keyboard.

Furthermore, the device according to the invention also enables voice-controlled operation of a newsreader. Other examples are the voice-controlled terminal emulation Database software, the voice-operated operation of a

Spreadsheet software or the voice-operated operation of a PPS (production planning control) system.

Likewise, an ERP system or an accounting system can be operated by means of acoustic operation.

Ultimately, it can also be used in all applications where the mechanical controls cannot be operated, e.g. Both hands are needed for other tasks, the device will be used.

It is an advantage of this invention to be able to offer a device which enables reliable automatic speech recognition, can be simply switched on or integrated as a peripheral device and replaces the previously conventional mechanical operation of a computer unit by voice operation. This greatly simplifies the operation and use of many software programs. The learning effort for operating the computer is greatly reduced, and work processes are accelerated and safer, which saves considerable costs and a lot of time.

Claims

claims

1 .

Device for converting voice commands and / or speech texts into

Keyboard and / or mouse movements and / or texts, characterized in that a computer unit (1) with a speech recognition unit (2) via a

Microphone (3) voice commands or language texts are converted into keyboard or mouse commands and / or texts by means of automatic speech recognition and speaker identification and transferred to a computer in the form of digital values via the USB interface (4) or another bidirectional interface (5).

Second

Device according to claim 1, characterized in that the computer unit (1) additionally has further USB connections (6) and thus realizes a USB distributor (7).

Third

Device according to one of claims 1 to 2, characterized in that the computer unit (1) inputs the data via a microphone (3)

Voice commands can be combined with the data of other peripheral devices and then as a data stream via the USB interface (4) or another bidirectional

Interface (5) to be transferred to a computer.

4th

Device according to one of claims 1 to 3, characterized in that the computer unit (l) via a PCMCIA (Personal Computer Memory Card

Interface Association) slot (8) for receiving peripheral devices, e.g.

Wireless network cards.

5th

Device according to one of claims 1 to 4, characterized in that the computer unit (1) is a speech recognition unit (2), one

Speaker identification unit (15) and a speech generation unit (9), which comprises a clock generator (10), CPU (Central Processor Unit) (11),

Command memory and / or data memory (12), microphone (3), a loudspeaker

(13) and an analog input and

-Output circuit (14).

6th

Device according to one of claims 1 to 5, characterized in that the emulation of the keyboard driver is made possible by the speech recognition unit (2).

7th

Device according to one of claims 1 to 6, characterized in that the computer unit (1) by the speech recognition unit (2)

Voice commands converted into keyboard and / or mouse commands, which enables the voice-controlled operation of a web browser.

8th.

Device according to one of claims 1 to 7, characterized in that the computer unit (1) by the speech recognition unit (2)

Converts voice commands that control the voice-operated operation of an email

Programs.

9th

Device according to one of claims 1 to 8, characterized in that the computer unit (1) by the speech recognition unit (2)

Voice commands that convert the voice-operated operation of a

Newsreaders enables.

10th

Device according to one of claims 1 to 9, characterized in that the computer unit (1) by the speech recognition unit

(2) Voice commands that convert the voice-operated operation of a

Terminal emulation enabled.

1 1.

Device according to one of claims 1 to 10, characterized in that the computer unit (1) by the speech recognition unit (2)

Voice commands that convert the voice-operated operation of a

Database software enables.

12th

Device according to one of claims 1 to 11, characterized in that the computer unit (1) by the speech recognition unit (2)

Voice commands that convert the voice-operated operation of a

Spreadsheet software enables.

13th

Device according to one of claims 1 to 12, characterized in that the computer unit (1) by the speech recognition unit (2)

Voice commands that convert the voice-operated operation of a

PPS (Production Planning Control) - Systems enables.

14th

Device according to one of claims 1 to 13, characterized in that the computer unit (1) by the speech recognition unit (2)

Voice commands that convert the voice-operated operation of a

ERP system enabled.

15th

Device according to one of claims 1 to 14, characterized in that the computer unit (1) by the speech recognition unit (2)

Voice commands that convert the voice-operated operation of a

Accounting system.