A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector

We introduce an efficient hidden Markov model-based voice activity detection (VAD) algorithm with time-variant state-transition probabilities in the underlying Markov chain. The transition probabilities vary in an exponential charge/discharge scheme and are softly merged with state conditional likelihood into a final VAD decision. Working in the domain of ITU-T G.729 parameters, with no additional cost for feature extraction, the proposed algorithm significantly outperforms G.729 Annex B VAD while providing a balanced tradeoff between clipping and false detection errors. The performance compares very favorably with the adaptive multirate VAD, option 2 (AMR2).

[1]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[2]  S. Casale,et al.  Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors , 2002, IEEE Signal Processing Letters.

[3]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[4]  J K Lindsey,et al.  Models for estimating the change-point in gas exchange data. , 2004, Physiological measurement.

[5]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[6]  Su Yang,et al.  A fractal based voice activity detector for Internet telephone , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[8]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[9]  T. Aboulnasr,et al.  A Gaussian/Laplacian hybrid statistical voice activity detector for line spectral frequency-based speech coders , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[10]  Ji Wu,et al.  Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[12]  Hisham Othman,et al.  A semi-continuous state transition probability HMM-based voice activity detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..