Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments

This study is intended to develop a noise robust distributed speech recognizer for real-world applications by employing Cepstral Mean Normalization (CMN) for robust feature extraction. The main focus of the work is to cope with different noisy environments. To realize this objective, MelLP based speech analysis has been used in speech coding on the linear frequency scale by applying a first-order all-pass filter instead of a unit delay. Mismatch between training and test phases is reduced through robust feature extraction by applying CMN on Mel-LP cepstral coefficients as an effort to reduce additive noise and channel distortion. The performance of the proposed system has been evaluated on test set A of Aurora-2 database which is a subset of TIDigits database contaminated by additive noises and channel effects. The experiment is conducted on four different noisy environments and the baseline performance, that is, for Mel-LPC the average word accuracy has found to be 59.05%. By applying the CMN on Mel-LP cepstral coefficients, the performance has been improved to 68.02%. It is found that CMN performs significantly better for different noisy environments.

[1]  M. Posner Human information processing: An introduction to psychology. 2nd ed. , 1977 .

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[4]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[5]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[6]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[7]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[8]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[9]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[10]  Qifeng Zhu,et al.  The effect of additive noise on speech amplitude spectra: a quantitative analysis , 2002, IEEE Signal Processing Letters.

[11]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[13]  Saeed Vaseghi,et al.  Noise-adaptive hidden Markov models based on wiener filters , 1993, EUROSPEECH.

[14]  Chafic Mokbel,et al.  Compensation of telephone line effects for robust speech recognition , 1994, ICSLP.

[15]  P. H. Lindsay,et al.  Human Information Processing: An Introduction to Psychology , 1972 .

[16]  Alan V. Oppenheim,et al.  Discrete representation of signals , 1972 .

[17]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[18]  D. C. Bateman,et al.  Spectral contrast normalization and other techniques for speech recognition in noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Hiroshi Matsumoto,et al.  An efficient mel-LPC analysis method for speech recognition , 1998, ICSLP.

[20]  H. Strube Linear prediction on a warped frequency scale , 1980 .

[21]  Mark J. F. Gales,et al.  Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[22]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[23]  John Makhoul,et al.  LPCW: An LPC vocoder with linear predictive spectral warping , 1976, ICASSP.

[24]  Hiroshi Matsumoto,et al.  Mel-Wiener Filter for Mel-LPC Based Speech Recognition , 2007, IEICE Trans. Inf. Syst..

[25]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  Mark J. F. Gales,et al.  HMM recognition in noise using parallel model combination , 1993, EUROSPEECH.

[27]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[28]  Nathalie Virag Speech enhancement based on masking properties of the auditory system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.