RECURRENT NEURAL NETWORK FEATURE ENHANCEMENT: THE 2nd CHIME CHALLENGE

We apply a machine learning approach to improve noisy acoustic features for robust speech recognition. Specifically, we train a deep, recurrent neural network to map noisecorrupted input features to their corresponding clean versions. We introduce several improvements to previously proposed neural network feature enhancement architectures. The model does not include assumptions about the specific noise and distortions present in CHiME data, but does assume noisy and clean stereo pairs are available for training. When used with the standard recognizer on the small vocabulary task (track 1), our approach demonstrates substantial improvements over the challenge baseline.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[3]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[4]  Quoc V. Le,et al.  Recurrent Neural Networks for Noise Reduction in Robust ASR , 2012, INTERSPEECH.

[5]  Daniel Povey,et al.  Revisiting Recurrent Neural Networks for robust ASR , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.