A graphical model for multi-sensory speech processing in air-and-bone conductive microphones

In continuation of our previous work on using an air-and-boneconductive microphone for speech enhancement, in this paper we propose a graphical model based approach to estimating the clean speech signal given the noisy observations in the air sensor. We also show how the same model can be used as a speech/non-speech classifier. With the aid of MOS (mean opinion score) tests we show, that the performance of the proposed model is better in comparison to our previously proposed direct filtering algorithm.

[1]  Trym Holter,et al.  On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Zicheng Liu,et al.  Direct filtering for air- and bone-conductive microphones , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[3]  Jacob Benesty,et al.  Filtering Techniques for Noise Reduction and Speech Enhancement , 2003 .

[4]  Kiyohiro Shikano,et al.  Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5]  Xuedong Huang,et al.  Air- and bone-conductive integrated microphones for robust speech detection and enhancement , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[6]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[7]  H. Franco,et al.  Combining standard and throat microphones for robust speech recognition , 2003, IEEE Signal Processing Letters.