Adaptation for soft whisper recognition using a throat microphone

This paper describes various adaptation methods applied to recognizing soft whisper recorded with a throat microphone. Since the amount of adaptation data is small and the testing data is very different from the training data, a series of adaptation methods is necessary. The adaptation methods include: maximum likelihood linear regression, feature-space adaptation, and re-training with downsampling, sigmoidal low-pass filter, or linear multivariate regression. With these adaptation methods, the word error rate improves from 99.3% to 32.9%.

[1]  Xuedong Huang,et al.  Air- and bone-conductive integrated microphones for robust speech detection and enhancement , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[4]  Alex Waibel,et al.  Streamlining the front end of a speech recognizer , 2000, INTERSPEECH.

[5]  Kiyohiro Shikano,et al.  Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Kiyohiro Shikano,et al.  Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[7]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..