Speech recognition algorithms for voice control interfaces

Recognition accuracy has been the primary objective of most speech recognition research, and impressive results have been obtained, e.g. less than 0.3% word error rate on a speaker-independent digit recognition task. When it comes to real-world applications, robustness and real-time response might be more important issues. For the first requirement we review some of the work on robustness and discuss one specific technique, spectral normalization, in more detail. The requirement of real-time response has to be considered in the light of the limited hardware resources in voice control applications, which are due to the tight cost constraints. In this paper we discuss in detail one specific means to reduce the processing and memory demands: a clustering technique applied at various levels within the acoustic modelling.

[1]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[2]  A. Nadas,et al.  Speech recognition using noise-adaptive prototypes , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Stefan Dobler,et al.  Speech recognition in the noisy car environment , 1989, Speech Commun..

[4]  Stephan Gamm,et al.  User interface design of voice controlled consumer electronics , 1995 .

[5]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[6]  Stefan Dobler,et al.  Real-time connected-word recognition in a noisy environment , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[7]  Hynek Hermansky,et al.  Towards handling the acoustic environment in spoken language processing , 1992, ICSLP.

[8]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[9]  Lawrence R. Rabiner,et al.  Mathematical foundations of hidden Markov models , 1988 .

[10]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  Dieter Geller,et al.  Improvements in connected digit recognition using linear discriminant analysis and mixture densities , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Hans-Günter Hirsch,et al.  Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.