The effects of room acoustics on MFCC speech parameter

Automatic speech recognition systems attain high performance for close-talking applications, but they deteriorate significantly in distant-talking environment. The reason is the mismatch between training and testing conditions. We have carried out a research work for a better understanding of the effects of room acoustics on speech feature by comparing simultaneous recordings of close talking and distant talking speech utterances. The characteristics of two degrading sources, background noise and room reverberation are discussed. Their impacts on the spectrum are different. The noise affects on the valley of the spectrum while the reverberation causes the distortion at the peaks at the pitch frequency and its multiples. In the situation of very few training data, we attempt to choose the efficient compensation approaches in the spectrum, spectrum subband or cepstrum domain. Vector Quantization based model is used to study the influence of the variation on feature vector distribution. The results of speaker identification experiments are presented for both close-talking and distant talking data.

[1]  Richard J. Mammone,et al.  Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Harald Singer,et al.  Speaker normalized spectral subband parameters for noise robust speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Til T. Phan,et al.  Text-Independent Speaker Identification , 1999 .

[4]  Alexander H. Waibel,et al.  Knowing who to listen to in speech recognition: visually guided beamforming , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Kazuya Takeda,et al.  Compensating of room acoustic transfer functions affected by change of room temperature , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  James L. Flanagan,et al.  Speech recognition in a reverberant environment using matched filter array (MFA) processing and linguistic-tree maximum likelihood linear regression (LT-MLLR) adaptation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).