Speech Recognition using FHMMS Robust Against Nonstationary Noise

We focus on the problem of speech recognition in the presence of nonstationary sudden noise, which is very likely to happen in home environments. As a model compensation method for this problem, we investigated the use of factorial hidden Markov model (FHMM) architecture developed from a clean-speech hidden Markov model (HMM) and a sudden-noise HMM. While in conventional studies this architecture is defined only for static features of the observation vector, we extended it to dynamic features. A database recorded by a personal robot called PaPeRo in home environments was used for the evaluation of the proposed method under noisy conditions. While we presented a recognition system using isolated-word FHMMs in our previous work, here we evaluated the effectiveness of the phoneme FHMMs.

[1]  Beth Logan,et al.  Factorial HMMs for acoustic modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[3]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[4]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[5]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[6]  Mark J. F. Gales,et al.  HMM recognition in noise using parallel model combination , 1993, EUROSPEECH.

[7]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[8]  Koichi Shinoda,et al.  Robust Speech Recognition Using Factorial HMMs for Home Environments , 2007, EURASIP J. Adv. Signal Process..

[9]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Mark Hasegawa-Johnson,et al.  A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..