Model adaptation by state splitting of HMM for long reverberation

In environment with considerably long reverberation time, each frame of speech is affected by reflected energy components from the preceding frames. Therefore to adapt model parameters of a state, it becomes necessary to consider these frames, and compute their contributions to current state. However, these clean speech frames preceding to a state of HMM are not known during adaptation of the models. This paper describes a method to estimate the preceding frames for a state in HMM, by splitting the state into a number of substates. The estimated sequence of frames can then be used to find reflected energy component for the state and compensate its parameters. The effectiveness of the method was confirmed by the experimental results on an isolated-word recognition task.

[1]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[2]  Fumitada Itakura,et al.  An approach to dereverberation using multi-microphone sub-band envelope estimation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Shigeki Sagayama,et al.  FRAME-BY-FRAME HMM ADAPTATION FOR REVERBERANT SPEECH RECOGNITION , 2004 .

[4]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[5]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition: Fundamentals and Applications , 1995 .

[6]  Masafumi Nishimura,et al.  Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech , 2006, IEICE Trans. Inf. Syst..

[7]  Brian Kingsbury,et al.  Recognizing reverberant speech with RASTA-PLP , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Masafumi Nishimura,et al.  Acoustic model adaptation using first order prediction for reverberant speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Hynek Hermansky,et al.  Multiresolution channel normalization for ASR in reverberant environments , 1997, EUROSPEECH.

[10]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[11]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[12]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..