Combined use of close-talk and throat microphones for improved speech recognition under non-stationary background noise

This paper intends to summarize recent developments and experimental results related to Automatic Speech Recognition (ASR) using signals captured with a throat-microphone. Due to the proximity of the sensor to the voice source, the signal is naturally less subject to background noise. This however yields speech sounds that have different frequency contents than with traditional microphones, and requires having specific acoustic models. We propose to use the information from both signals by combining the probability vectors provided by both acoustic models. The systems are evaluated on a connected digit recognition task in French. A database has been recorded for both training the acoustic models and for testing the whole setup. It contains both throat and “ordinary” close-talk signals. To avoid any possibly unrealistic assumption on the effect of noise on each signal, the test portion has been acquired using a background noise played back through loudspeakers. The ASR experiments that we achieved demonstrate the benefit of using alternative microphones. Relative recognition improvements as high as 80% were obtained on sequences of digits recorded in loud musical environment.

[1]  Kiyohiro Shikano,et al.  Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[3]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[4]  Xuedong Huang,et al.  Air- and bone-conductive integrated microphones for robust speech detection and enhancement , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5]  John F. Holzrichter,et al.  Denoising of human speech using combined acoustic and EM sensor signal processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Toshiaki Sugimura,et al.  "Unvoiced speech recognition using EMG - mime speech recognition" , 2003, CHI Extended Abstracts.

[7]  Trym Holter,et al.  On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[8]  H. Franco,et al.  Combining standard and throat microphones for robust speech recognition , 2003, IEEE Signal Processing Letters.

[9]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .