WO2005020211A1

WO2005020211A1 - Voice-assisted text input for pre-installed applications in mobile devices

Info

Publication number: WO2005020211A1
Application number: PCT/EP2004/051753
Authority: WO
Inventors: Steffen Harengel; Andreas Ralph Major
Original assignee: Siemens Aktiengesellschaft
Priority date: 2003-08-18
Filing date: 2004-08-09
Publication date: 2005-03-03
Also published as: DE10337822A1

Abstract

According to the invention, an interposed voice-recognition application allows letters or commands to be input into any applications running on an operating system. To achieve this, the operating system transfers said letters or commands to the applications in the form of key codes.

Description

description

Voice-supported text entry for pre-installed applications on mobile devices

So far, preinstalled programs in mobile devices can only be operated using the input devices they contain, which is mostly cumbersome, lengthy and only possible when the operator is at rest due to the options available. For example, virtual keyboards or miniature keys function as input devices.

In mobile telephones, there is usually only a numeric keypad for entering text, by pressing the keys several times and entering a corresponding letter and showing it on the display.

With Personal Digital Assistants (PDAs) there are often two different input options. On the one hand, pen input via a virtual keyboard, which is shown in the display, or handwriting recognition, which is mostly used as an alternative input for the virtual keyboard. A miniaturized hardware keyboard is sometimes used for somewhat larger devices. The manufacturer of the device decides which input method is used. The input options described here are integrated in the PDAs available on the market.

Web pads / SIM pads also only have a virtual keyboard and handwriting recognition. These devices differ from the PDAs only in that their screen and / or touchscreen is a lot larger.

Proceeding from this, the object of the invention is to provide a possibility for input, in particular text, in applications preinstalled on small mobile devices. This object is achieved by the inventions specified in the independent patent claims. Advantageous refinements result from the subclaims.

Accordingly, in a speech recognition method, an application is in a state in which keyboard codes can be fed to it. A speech signal is entered and converted into keyboard codes by a speech recognition unit. The keyboard codes are fed to the application.

The application preferably runs on an operating system. The operating system can be a message-based operating system. The keyboard codes are then sent to the application as messages by the operating system.

The keyboard codes are, for example, letters.

In addition to the letters, commands can be entered as a speech signal, recognized by the speech recognition unit and fed to the application as keyboard codes. This allows the application to continue to be controlled.

The method runs in particular on a mobile device with one or more preinstalled applications.

An arrangement that is set up to carry out one of the described methods can be, for example, by

Realize programming and setting up a data processing system with means belonging to the mentioned method steps.

A program product for a data processing system that contains code sections with which one of the described methods are carried out on the data processing system can be carried out by suitable implementation of the method in a programming language and translation into code executable by the data processing system. The code sections are saved for this purpose. A program product is understood to mean the program as a tradable product. It can be in any form, for example on paper, a computer-readable data medium or distributed over a network.

Further advantages and features of the invention result from the description of an exemplary embodiment with reference to the figure, which represents a method for speech recognition using a flowchart.

In order to illustrate the problem that is fundamental to the method, an embodiment follows first. In the example, a conventional PDA is operated using the Windows CE operating system. Nowadays it is possible to start programs on the PDA using a predefined vocabulary based on automatic speech recognition. In the example, the "Contacts new entry" program is started in this way in order to get to the input mask of a new contact. The cursor is then automatically in the input field of the name. At this point, it becomes clear that word-based speech recognition cannot be used to create a contact, because the large vocabulary means that resource and CPU consumption make implementation on mobile devices difficult or impossible. In order to enable the user to create an individual contact, a spelling recognizer is used instead of a word-based speech recognizer, which has also been supplemented with some voice command words. Now the names / words to be entered by the user are spelled out. After the recognition has been completed, the window focus can be shifted from the previous input field to the next one by voice command, for example "next". This process is now repeated for all input fields. After from If all the desired data has been entered, the new data record can be added to the device database, for example, using the "save" command.

In order to implement this method on a terminal device that has a message-controlled operating system such as Windows, Unix, Windows CE, Epoc, all that is required is an additional, specially designed speech recognition application. This speech recognition application runs in the background, controls the speech recognizer and sends the recognized letters via operating system-specific messages to the active window focus of the foreground application. No knowledge of the foreground application is required. This method can therefore be used for any application that uses text input.

The method described here uses the existing communication interfaces of the programs with the operating system on the device.

Next, a speech recognition method will be described. In order to better compare the following description with the figure, the individual steps are provided with reference numerals.

Step 1 The application currently activated in the operating system is queried. Using Windows as an example, the return value is a window handle. Next in

Step 2 asks which window of the active application has the input focus (cursor). Again using Windows as an example, this can be done with GetFocus. The return value is the handle to the window that contains the input cursor. Step 3 Now all preparatory work has been done and the application is in idle mode. This mode monitors whether the active window or the window with the input focus changes. Pressing the PTT (Push to Talk) button starts the speech recognizer.

Step 4 In this step, the automatic speech recognizer is stopped again because there is a valid recognition result. Individual letters and commands can be recognized. If there is an error detection, the system jumps back to step 3.

Step 5 If a letter is recognized, the corresponding KeyCode is sent to the window with the input focus as with the keyboard codes of a conventional keyboard. Using Windows as an example, this works with SendMessage, for example.

Step β This enters the recognized command in the field with the focus.

Step 7 If a command is recognized in step 4, this is interpreted and the corresponding KeyCode is sent to the active application.

Step 8 Here the command "next" was recognized, which pushes the window focus to the next input field. Using Windows as an example, this can be achieved by sending the Tab key code. Step 9 Delete the content of the active field with the "empty" command. In Windows this can be achieved by sending the KeyCode for Backspace until the field is empty.

Step 10 The "back" command is carried out as described in step 9. However, the KeyCode for backspace is only sent once.

Step 11 The recognized command Save "save", which is carried out in the example Windows CE by sending the OK code "Enter", the entered data is entered into the database.

In step 8, in addition to "next", "previous" can be recognized to go to the previous input field (KeyCode Shift + Tabulator).

The method was presented using pure speech recognition. However, this text input method can also be used with other internal or external input devices or when combining different input devices. An interesting solution for transferring text to the mobile device would be to use a scanner pen combined with speech recognition. The text is transferred using a scanner pen and the additional control instructions are given via voice commands, for example "next" for the next field, "save" for saving, etc.

The methods presented have the following advantages in common:

- They work with all message-driven operating systems (Windows, Unix, Windows CE, Symbian OS etc.), - They enable fast and natural text input compared to conventional input options, - All text fields of an application can be executed by the user via voice or other input device,

- all possible words can be created by spelling, - text can be adopted 1: 1 by other input devices,

the method can be operated in the dynamic state, i.e. while the user is moving,

- Simple operation is also possible for the physically disabled.

Proof of using the method is very simple. You only have to try out whether a hand-written program can be edited by voice after installation.

Claims

claims

1. A method of speech recognition for an application in which

the application is in a state in which keyboard codes can be fed,

a speech signal is entered and converted into keyboard codes by a speech recognizer,

- The keyboard codes of the application are fed.

2. The method according to claim 1,

- in which the application runs on an operating system,

- in which the operating system is a message-based operating system and the keyboard codes of the application are supplied as messages by the operating system.

3. The method according to any one of the preceding claims, wherein the keyboard codes are letters.

4. The method of claim 3, in which commands are entered in addition to the letters, recognized and supplied to the application as keyboard codes.

5. The method according to any one of the preceding claims, wherein the method runs on a mobile terminal.

6. Arrangement which is set up to carry out a method according to one of the preceding claims.

7. Program product that, when on a

Data processing system loaded and executed on it, a method according to one of claims 1 to 5 or a device according to claim 6 in force.