Exploiting the potential of auditory preprocessing for robust speech recognition by locally recurrent neural networks

We present a robust speaker independent speech recognition system consisting of a feature extraction based on a model of the auditory periphery, and a locally recurrent neural network for scoring of the derived feature vectors. A number of recognition experiments were carried out to investigate the robustness of this combination against different types of noise in the test data. The proposed method is compared with cepstral, RASTA, and JAH-RASTA processing for feature extraction and hidden Markov models for scoring. The presented results show that the information in features from the auditory model can be best exploited by locally recurrent neural networks. The robustness achieved by this combination is comparable to that of JAH-RASTA in combination with HMM but without any requirement for an explicit adaptation to the noise in speech pauses.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Herbert Reininger,et al.  Strategies for reducing the complexity of a RNN based speech recognizer , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Grace Tmgt INTEGRATING RASTA-PLP INTO SPEECH RECOGNITION , 1994 .

[4]  Herbert Reininger,et al.  A SPEECH RECOGNIZER BASED ON LOCALLY RECURRENT NEURAL NETWORKS , 1995 .

[5]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.