CENSREC-1-C: An evaluation framework for voice activity detection under noisy environments

Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding under noisy environments. We have developed an evaluation framework for VAD under noisy environments, named CENSREC-1-C. We designed this framework for simple isolated utterance detection and hence, this framework consists of noisy continuous digit utterances and evaluation tools for VAD results. We define two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance. We also provide the evaluation results of a power-based VAD method as a reference.

[1]  Satoshi Nakamura,et al.  CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition , 2006, INTERSPEECH.

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Masakiyo Fujimoto,et al.  Noise Robust Voice Activity Detection Based on Switching Kalman Filter , 2008, IEICE Trans. Inf. Syst..

[4]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[5]  Seiichi Nakagawa,et al.  Evaluation of spectral subtraction with smoothing of time direction on the Aurora 2 task , 2002, INTERSPEECH.

[6]  M.N.S. Swamy,et al.  An improved voice activity detection using higher order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Satoshi Nakamura,et al.  AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition , 2005, IEICE Trans. Inf. Syst..

[8]  Satoshi Nakamura,et al.  CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments , 2006, IEICE Trans. Inf. Syst..

[9]  Tatsuya Kawahara,et al.  Evaluation of real-time voice activity detection based on high order statistics , 2007, INTERSPEECH.

[10]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[11]  Masakiyo Fujimoto,et al.  Noise robust voice activity detection based on periodic to aperiodic component ratio , 2010, Speech Commun..