Robust automatic speech recognition by the application of a temporal-correlation-based recurrent multilayer neural network to the mel-based cepstral coefficients

In this paper, the problem of robust speech recognition has been considered. Our approach is based on the noise reduction of the parameters that we use for recognition, that is, the Mel-based cepstral coefficients. A Temporal-Correlation-Based Recurrent Multilayer Neural Network (TCRMNN) for noise reduction in the cepstral domain is used in order to get less-variant parameters to be useful for robust recognition in noisy environments. Experiments show that the use of the enhanced parameters using such an approach increases the recognition rate of the continuous speech recognition (CSR) process. The HTK Hidden Markov Model Toolkit was used throughout. Experiments were done on a noisy version of the TIMIT database. With such a pre-processing noise reduction technique in the front-end of the HTK-based continuous speech recognition system (CSR) system, improvements in the recognition accuracy of about 17.77% and 18.58% using single mixture monophones and triphones, respectively, have been obtained at a moderate SNR of 20 dB.

[1]  Richard M. Stern,et al.  Environment normalization for robust speech recognition using direct cepstral comparison , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[3]  M. Schroeder Direct (nonrecursive) relations between cepstrum and predictor coefficients , 1981 .

[4]  H.B.D. Sorensen,et al.  A cepstral noise reduction multi-layer neural network , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Climent Nadeu,et al.  A comparative study of parameters and distances for noisy speech recognition , 1991, EUROSPEECH.

[6]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[8]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[9]  John S. D. Mason,et al.  On the limitations of cepstral features in noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .