The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines

Distant-microphone automatic speech recognition (ASR) remains a challenging goal in everyday environments involving multiple background sources and reverberation. This paper is intended to be a reference on the 2nd `CHiME' Challenge, an initiative designed to analyze and evaluate the performance of ASR systems in a real-world domestic environment. Two separate tracks have been proposed: a small-vocabulary task with small speaker movements and a medium-vocabulary task without speaker movements. We discuss the rationale for the challenge and provide a detailed description of the datasets, tasks and baseline performance results for each track.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Roland Maas,et al.  AT wo-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments , 2011 .

[3]  Masakiyo Fujimoto,et al.  Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation , 2011 .

[4]  Sang-Hun Kim,et al.  Zero-Crossing-Based Channel Attentive Weighting of Cepstral Features for Robust Speech Recognition: The ETRI 2011 CHiME Challenge System , 2011, INTERSPEECH.

[5]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[6]  Jan Nouza,et al.  CHiME Data Separation Based on Target Signal Cancellation and Noise Masking , 2011 .

[7]  Tuomas Virtanen,et al.  Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition , 2011 .

[8]  Guy J. Brown,et al.  Mask Estimation and Sparse Imputation for Missing Data Speech Recognition in Multisource Reverberant Environments , 2011 .

[9]  Tuomas Virtanen,et al.  Exemplar-based Recognition of Speech in Highly Variable Noise , 2011 .

[10]  A.M. Kondoz,et al.  Head-related transfer function filter interpolation by root displacement , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[11]  Reinhold Haeb-Umbach,et al.  Robust Speech Recognition of Uncertain or Missing Data - Theory and Applications , 2011 .

[12]  Keith Vertanen Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .

[13]  Stefan Schacht,et al.  To separate speech: a system for recognizing simultaneous speech , 2007, ICML 2007.

[14]  Dorothea Kolossa,et al.  CHiME Challenge: Approaches to Robustness using Beamforming and Uncertainty-of-Observation Techniques , 2011 .

[15]  Ravichander Vipperla,et al.  Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization , 2011 .

[16]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[17]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[18]  John R. Hershey,et al.  Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..

[19]  Emmanuel Vincent,et al.  Using the FASST source separation toolbox for noise robust speech recognition , 2011 .

[20]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[21]  Yuuki Tachioka,et al.  Effectiveness of discriminative training and feature transformation for reverberated and noisy speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Björn Schuller,et al.  The Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments , 2011, Interspeech 2011.

[23]  Francesco Nesta,et al.  Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction , 2011 .

[24]  Ning Ma,et al.  Recent advances in fragment-based speech recognition in reverberant multisource environments , 2011 .

[25]  Ivan Himawan,et al.  Microphone Array Beamforming Approach to Blind Speech Separation , 2007, MLMI.

[26]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[27]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.