论文信息 - Speech recognition technology for mobile phones

Speech recognition technology for mobile phones

Last year, Ericsson was one of the first mobile phone manufacturers to add an important technology to mobile phones. The T18, launched in spring 1999, was the first commercially available Ericsson GSM phone that could be operated by voice commands using automatic speech recognition, in addition to commands input via the keypad. Other members of the family of telephones using Ericsson’s first generation of speech control algorithms are the T28, R320, and A2618 (Figure 1). These phones use speech recognition for the new name dialing feature. Thanks to the efficient use of memory, it is currently possible to train and store voice tags for up to 10 entries in the phone book of any of these phones. Each voice tag is trained with a single utterance by the user and assigned to a single phone book entry. When the user wants to place a call, he pushes a button and speaks a person’s name. The phone answers with the recognized voice tag as acoustic feedback, and then automatically sets up the call. All Ericsson phones with speech recognition capabilities also feature call answering, which allows the user to accept or reject incoming calls using voice commands. This has obvious advantages when the phone is used with hands-free equipment. Compared to dictation products commercially available for desktop PCs, the application described here seems elementary. However, mobile phones are used every day, in a variety of locations, with every kind of background noise imaginable. Hence, the key issue for speech recognition in mobile devices is not the size of the vocabulary but the robustness of the recognition system. On one hand, the phone must recognize speech correctly, say, in a quiet office setting, at an airport with conversations going on in the background, or in a car traveling at 150 km/h. On the other hand, care must be taken so that no incidental noise such as a closing door or laughter is mistaken for a valid name, which would lead to a call being set up. Also, the recognizer should work properly with any type of microphone, at a variety of distances and angles between the mouth and microphone, and despite changes from handset to hands-free equipment, all without having to retrain vocabulary.

Stefan Dobler