A likelihood measure based on projection-based group delay scheme for Mandarin speech recognition in noise

This paper investigates a projection-based group delay scheme (PGDS) likelihood measure that significantly reduces noise contamination in speech recognition. Because the norm of the cepstral/GDS vector will be shrinked when the speech signals are corrupted by additive noise, the HMM parameters, namely, the mean vector and the covariance matrix, need to be furthermore modified. In this paper, the mean vector compensation, a covariance matrix adaptation function and state duration based upon the projection-based group delay scheme were incorporated with a semi-continuous HMM to improve the recognition rate in noisy environments. The proposed approach compensates the mean vector using a projection-based scale factor and the mean compensation bias, and fits the covariance matrix using a variance adaptive function. The bias and variance adaptive functions estimated from the training and/or testing data were used to balance the mismatch between different environments. Lastly, a state duration method was utilized to deal with the problem that the additive noise segments the error path in Viterbi decoding. Experiments declare that the PGDS presented herein can remarkably elevate the recognition performance in noisy environments.

[1]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[2]  Fumitada Itakura,et al.  Distance measure for speech recognition based on the smoothed group delay spectrum , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mitch Weintraub,et al.  Energy conditioned spectral estimation for recognition of noisy speech , 1993, IEEE Trans. Speech Audio Process..

[4]  Hsiao-Chuan Wang,et al.  MAT - A Project to Collect Mandarin Speech Data Through Telephone Net works in Taiwan , 1997, Int. J. Comput. Linguistics Chin. Lang. Process..

[5]  Jean Rouat,et al.  A new approach for wavelet speech enhancement , 2001, INTERSPEECH.

[6]  Mark J. F. Gales Predictive model-based compensation schemes for robust speech recognition , 1998, Speech Commun..

[7]  Fumitada Itakura,et al.  Low bit quantization of the smoothed group delay spectrum for speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Mark J. F. Gales,et al.  Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Jae S. Lim,et al.  Speech enhancement , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hsiao-Chuan Wang,et al.  Linear interpolation of cepstral variance for noisy speech recognition , 2001, INTERSPEECH.

[11]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[12]  D. B. Roe Speech recognition with a noise-adapting codebook , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Douglas D. O'Shaughnessy,et al.  Robust automatic speech recognition in low-SNR car environments by the application of a connectionist subspace-based approach to the melbased cepstral coefficients , 2001, INTERSPEECH.

[14]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[15]  Jen-Tzung Chien Combined linear regression adaptation and Bayesian predictive classification for robust speech recognition , 2001, INTERSPEECH.

[16]  Kuo-Chang Huang,et al.  Mean compensation based on projection-based group delay scheme for noisy speech recognition , 1999 .

[17]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[18]  Yoh'ichi Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[19]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[20]  M.G. Rahim,et al.  Signal conditioning techniques for robust speech recognition , 1996, IEEE Signal Processing Letters.

[21]  D. Mansor The short-time modified coherence representation and noisy speech recognition , 1989 .

[22]  D Mansour,et al.  A FAMILY OF DISTORTION MEASURES BASED UPON PROJECTION OPERATION OF ROBUST SPEECH RECOGNITION, IEEE TRANS , 1989 .

[23]  Sabri Gurbuz,et al.  Applying parallel model compensation with mel-frequency discrete wavelet coefficients for noise-robust speech recognition , 2001, INTERSPEECH.

[24]  Yau-Tarng Juang,et al.  Projection-based group delay scheme for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[25]  Mitch Weintraub,et al.  Filterbank-energy estimation using mixture and Markov models for recognition of noisy speech , 1993, IEEE Trans. Speech Audio Process..

[26]  D. Mansour,et al.  The short-time modified coherence representation and its application for noisy speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[27]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[28]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[29]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[30]  Y. Ephraim Statistical model-based speech enhancement systems , 1988 .

[31]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[32]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[33]  Chiu-yu Tseng,et al.  MAT-2000 - design, collection, and validation of a Mandarin 2000-speaker telephone speech database , 2000, INTERSPEECH.

[34]  John S. D. Mason,et al.  On the limitations of cepstral features in noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Jae S. Lim,et al.  Speech Enhancement: Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632, 1983, 363 pp., ISBN 0-13-829705-3 , 1983 .

[36]  Jerome R. Bellegarda,et al.  Statistical techniques for robust ASR: review and perspectives , 1997, EUROSPEECH.

[37]  Mark A. Clements,et al.  Application of a weighted projection measure for robust hidden Markov model based speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[38]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..

[39]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[40]  Jeongsu Kim,et al.  Enhancement of noisy speech by using improved global soft decision , 2001, INTERSPEECH.

[41]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[42]  Mark A. Clements,et al.  Speech recognition in noise using a projection-based likelihood measure for mixture density HMM's , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  D. C. Bateman,et al.  Spectral contrast normalization and other techniques for speech recognition in noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Hwang Soo Lee,et al.  Sigmoidal spectral conversion with changeable dynamic region for speech feature extraction , 1999 .

[45]  Mark A. Clements,et al.  A projection-based likelihood measure for speech recognition in noise , 1994, IEEE Trans. Speech Audio Process..