Voice activity detection based on conditional MAP criterion incorporating the spectral gradient

In this paper, we propose a novel approach to improve a statistical model-based voice activity detection (VAD) method based on a modified conditional maximum a posteriori (MAP) criterion incorporating the spectral gradient scheme. The proposed conditional MAP incorporates not only the voice activity decision in the previous frame as in [1] but also the spectral gradient of the observed spectra between the current frame and the past frames to efficiently exploit the inter-frame correlation of voice activity. As a result, the proposed VAD leads to six separate thresholds to be adaptively determined in the likelihood ratio test (LRT) depending on both the previous VAD result and the estimated spectral gradient parameter. Experimental results demonstrate that the proposed approach yields better results compared to those of the previous conditional MAP-based method.

[1]  Khaled Helmi El-Maleh,et al.  Techniques for Digital Coding of Speech-plus-Noise , 2004 .

[2]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[3]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[4]  Joon-Hyuk Chang,et al.  Voice activity detector employing generalised Gaussian distribution , 2004 .

[5]  Nam Soo Kim,et al.  Voice Activity Detection Based on Conditional MAP Criterion , 2008, IEEE Signal Processing Letters.

[6]  Joon-Hyuk Chang,et al.  Minima-controlled speech presence uncertainty tracking method for speech enhancement , 2011, Signal Process..

[7]  Ahmet M. Kondoz,et al.  Improved voice activity detection based on a smoothed statistical likelihood ratio , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Joon-Hyuk Chang,et al.  Voice activity detection based on complex Laplacian model , 2003 .

[9]  Joon-Hyuk Chang,et al.  Likelihood ratio test with complex laplacian model for voice activity detection , 2003, INTERSPEECH.

[10]  Joon-Hyuk Chang,et al.  Statistical modeling of speech signals based on generalized gamma distribution , 2005, IEEE Signal Process. Lett..

[11]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[12]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.