论文信息 - Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.

[1] Roger K. Moore,et al. Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2] Lou Boves,et al. Annotation in the SpeechDat Projects , 2001, Int. J. Speech Technol..

[3] John H. L. Hansen,et al. Robust digit recognition in noise: an evaluation using the AURORA corpus , 2001, INTERSPEECH.

[4] Rainer Martin,et al. Spectral Subtraction Based on Minimum Statistics , 2001 .

[5] Zdravko Kacic,et al. A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm , 2001, INTERSPEECH.

[6] Z. Kacic,et al. The design of mobile multimodal communication device - personal navigator , 2001, EUROCON'2001. International Conference on Trends in Communications. Technical Program, Proceedings (Cat. No.01EX439).

[7] Sharon L. Oviatt. Multimodal signal processing in naturalistic noisy environments , 2000, INTERSPEECH.

[8] Jean-Claude Junqua,et al. Robustness in Automatic Speech Recognition , 1996 .

[9] John H. L. Hansen,et al. Discrete-Time Processing of Speech Signals , 1993 .

[10] Christophe Beaugeant,et al. Recognition performance of the siemens front-end with and without frame dropping on the Aurora 2 database , 2001, INTERSPEECH.

[11] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[12] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[13] Darren Pearce,et al. Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[14] Narada D. Warakagoda,et al. A Noise Robust Multilingual Reference Recogniser Based on Speechdat(II) , 2000, INTERSPEECH.

[15] Hynek Hermansky,et al. Robust ASR front-end using spectral-based and discriminant features: experiments on the Aurora tasks , 2001, INTERSPEECH.