On the interplay between auditory-based features and locally recurrent neural networks for robust speech recognition in noise

The combination of a model of auditory perception (PEMO) as feature extractor and of a Locally Recurrent Neural Network (LRNN) as classi er yields promising ASR results in noise. Our study focuses on the interplay between both techniques and their ability to complement each other in the task of robust speech recognition. We performed recognition experiments with modi cations of PEMO processing concerning amplitude compression and envelope modulation ltering. The results show that the distinct and sparse peaks of PEMO speech representation which are well maintained in noise are su cient cues for LRNN-based recognition due to LRNN's ability to exploit information which is distributed over time. Enhanced envelope modulation bandpass ltering of PEMO feature vectors better re ects the average modulation spectrum of speech and further decreases the in uence of noise.