A dual-microphone subband-based Voice Activity Detector using higher-order cumulants

The paper proposes a robust dual-microphone algorithm for Voice Activity Detection (VAD) suitable for detecting speech arriving from random directions. The algorithm is based on the use of higher order statistics (HOS) in the complex subband domain in order to effectively detect voicing segments and distinguish them from nonharmonic noise. Metrics based on new established properties of the 2nd and 4th-order cumulants of complex exponentials are derived. The pros and cons of each of these are analyzed and validated through simulation in various SNR conditions. The results show the proposed scheme is effective in discriminating voiced speech segments, and is robust to Gaussianlike and real-life recorded noises, even in low SNR.

[1]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[2]  Régine Le Bouquin-Jeannès,et al.  A Two-Sensor Noise Reduction System: Applications for Hands-Free Car Kit , 2003, EURASIP J. Adv. Signal Process..

[3]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[4]  Samy A. Mahmoud,et al.  Speech analysis and quality enhancement using higher order cumulants , 1999 .

[5]  S. Gökhun Tanyer,et al.  A geometric algorithm for voice activity detection in nonstationary Gaussian noise , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[6]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[7]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[8]  C.J. Debono,et al.  A maximum log-likelihood approach to voice activity detection , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[9]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[10]  Yanmeng Guo,et al.  A two-microphone based voice activity detection for distant-talking speech in wide range of direction of arrival , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Masakiyo Fujimoto,et al.  Two-Microphone Voice Activity Detection Based on the Homogeneity of the Direction of Arrival Estimates , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Yan Feng,et al.  Voice activity detection based on the bispectrum , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[13]  Kun-Ching Wang,et al.  Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy , 2008, 2008 Second International Symposium on Universal Communication.

[14]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[15]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.