Combining standard and throat microphones for robust speech recognition

We present a method to combine the standard and throat microphone signals for robust speech recognition in noisy environments. Our approach is to use the probabilistic optimum filter (POF) mapping algorithm to estimate the standard microphone clean-speech feature vectors, used by standard speech recognizers, from both microphones' noisy-speech feature vectors. A small untranscribed "stereo" database (noisy and clean simultaneous recordings) is required to train the POF mappings. In continuous-speech recognition experiments using SRI International's DECIPHER recognition system, both using artificially added noise and using recorded noisy speech, the combined-microphone approach significantly outperforms the single-microphone approach.

[1]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[2]  Leonardo Neumeyer,et al.  Probabilistic optimum filtering for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Andreas Stolcke,et al.  Acoustic Modeling for the SRI Hub4 Partitioned Evaluation Continuous Speech Recognition System , 1997 .

[4]  John F. Holzrichter,et al.  Denoising of human speech using combined acoustic and EM sensor signal processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).