METHODOLOGY FOR THE DESIGN OF A ROBUST VOICE ACTIVITY DETECTOR FOR SPEECH ENHANCEMENT

We propose a general methodology to design a robust voice activity detector that suits the needs of the speech enhancement system it is dedicated to. More than imposing rules, we initiate ideas on how to perform the analysis of the requirements for the Voice Activity Detection (VAD) and how to choose a reference, and evaluate the performances of the explored solutions in order to choose the one that best fits. As an example, the methodology is then applied to evaluate five VADs based on features described in the literature in the scope of two typical speech enhancement applications.

[1]  N. P. Fan,et al.  Multichannel voice detection in adverse environments , 2002, 2002 11th European Signal Processing Conference.

[2]  Khaled Assaleh,et al.  A robust endpoint detection of speech for noisy environments with application to automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Rafik A. Goubran,et al.  SNR estimation of speech signals using subbands and fourth-order statistics , 1999, IEEE Signal Processing Letters.

[4]  K. Srinivasan,et al.  Voice activity detection for cellular networks , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[5]  Giuseppe Ruggeri,et al.  A psychoacoustic auditory model to evaluate the performance of a voice activity detector , 2000, Signal Process..

[6]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.