A robust feature extraction for automatic speech recognition in noisy environments

This paper presents a method for extraction of speech robust features when the external noise is additive and has white noise characteristics. The process consists of a short time power normalisation which goal is to preserve as much as possible, the speech features against noise. The proposed normalisation will be optimal if the corrupted process has, as the noise process white noise characteristics. With optimal normalisation we can mean that the corrupting noise does not change at all the means of the observed vectors of the corrupted process. As most of the speech energy is contained in a relatively small frequency band being most of the band composed by noise or noise-like power, this normalisation process can still capture most of the noise distortions. For signal to noise ratio greater than 5 dB the results show that for stationary white noise, the normalisation process where the noise characteristics are ignored at the test phase, outperforms the conventional Markov models composition where the noise is known. If the noise is known, a reasonable approximation of the inverted system can be easily obtained performing noise compensation still increasing the recogniser performance.

[1]  Mark J. F. Gales,et al.  Parallel model combination on a noise corrupted resource management task , 1994, ICSLP.

[2]  H. Gish,et al.  Probabilistic vector mapping of noisy speech parameters for HMM word spotting , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[4]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[5]  Dirk Van Compernolle Noise adaptation in a hidden Markov model speech recognition system , 1989 .

[6]  Biing-Hwang Juang,et al.  Signal restoration by spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hynek Hermansky,et al.  Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain , 1985, Speech Commun..

[8]  D. Mansour,et al.  The short-time modified coherence representation and its application for noisy speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9]  Climent Nadeu,et al.  Speech recognition in noisy car environment based on OSALPC representation and robust similarity measuring techniques , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Climent Nadeu,et al.  A comparative study of parameters and distances for noisy speech recognition , 1991, EUROSPEECH.

[11]  D Mansour,et al.  A FAMILY OF DISTORTION MEASURES BASED UPON PROJECTION OPERATION OF ROBUST SPEECH RECOGNITION, IEEE TRANS , 1989 .

[12]  Francisco Javier Hernando Pericás,et al.  On the AR modelling of the one-sided autocorrelation sequence for noisy speech recognition , 1992 .

[13]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Hynek Hermansky,et al.  Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP) , 1991, EUROSPEECH.

[15]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..