Selection of Reliable Likelihood Ratios for Statistical Model-Based Voice Activity Detection

A statistical model-based voice activity detection (VAD) is a robust algorithm in noisy condition to detect speech region from input signal by speech and non-speech statistical model such as complex Gaussian probability density function (PDF). The decision rule used in this VAD is based on Bayes’ rule and considers likelihood ratios (LRs) in whole frequency region. In this VAD, however, the Bayes’ rule may cause a decision error. With the statistical model, we analyze why this problem happens and show how we can decrease the decision error by using the LRs at selected frequency bins having relatively high spectral power in each frame. The performance of this VAD is evaluated by receiver operating characteristic (ROC) curves and summarized in a table, and the results from proposed methods show better performances than those of typical statistical model-based VAD.

[1]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[2]  Joon-Hyuk Chang,et al.  Spectral enhancement based on global soft decision , 2000, IEEE Signal Process. Lett..

[3]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[4]  Wonyong Sung,et al.  A voice activity detector employing soft decision based noise spectrum adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[6]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[7]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.