A fragment-decoding plus missing-data imputation ASR system evaluated on the 2nd CHiME Challenge

This paper reports on our entry to the small-vocabulary, moving-talker track of the 2nd CHiME challenge. The system we employ is based on the one that we developed for the 1st CHiME challenge, the latest results of which are reported in (Ma and Barker, 2012). Our motivation is to benchmark the system on the new CHiME challenge and to measure the extent to which it is robust against speaker motion, a feature of the second challenge that was absent in the first. The paper presents a brief overview of our fragment-decoding plus missingdata imputation system and then makes a component-bycomponent analysis of the system performance on both the 1st and 2nd CHiME challenge datasets. We conclude that due to its reliance on pitch and spectral cues the system is robust against the introduction of small speaker motions. We achieve an average keyword recognition score of 85.9% compared to 86.3% for the stationary speaker condition.

[1]  Björn W. Schuller,et al.  Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory , 2013, Comput. Speech Lang..

[2]  Ning Ma,et al.  A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources , 2013, Comput. Speech Lang..

[3]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  James Glass,et al.  Research Developments and Directions in Speech Recognition and Understanding, Part 1 , 2009 .

[5]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[6]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[7]  James R. Glass,et al.  Developments and directions in speech recognition and understanding, Part 1 [DSP Education] , 2009, IEEE Signal Processing Magazine.

[8]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[9]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[10]  Jean Paul Haton,et al.  On noise masking for automatic missing data speech recognition: A survey and discussion , 2007, Comput. Speech Lang..

[11]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[12]  Masakiyo Fujimoto,et al.  Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds , 2013, Comput. Speech Lang..

[13]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[14]  P. Renevey,et al.  Detection of Reliable Features for Speech Recognition in Noisy Condi-tions Using a Statistical Criterion , 2001 .

[15]  CookeMartin,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001 .

[16]  Ning Ma,et al.  Coupling identification and reconstruction of missing features for noise-robust automatic speech recognition , 2012, INTERSPEECH.