Applying the Bi-level HMM for Robust Voice-activity Detection

This paper presents a voice-activity detection (VAD) method for sound sequences with various SNRs. For real-time VAD applications, it is inadequate to employ a post-processing for the removal of burst clippings from the VAD output decision. To tackle this problem, building on the bilevel hidden Markov model, for which a state layer is inserted into a typical hidden Markov model (HMM), we formulated a robust method for VAD not requiring any additional post-processing. In the method, a forward-inference-ratio test was devised to detect the speech endpoints and Mel-frequency cepstral coefficients (MFCC) were used as the features. Our experiment results show that, regarding different SNRs, the performance of the proposed approach is more outstanding than those of the conventional methods.

[1]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[3]  Jie Zhu,et al.  A novel voice activity detection based on phoneme recognition using statistical model , 2012, EURASIP J. Audio Speech Music. Process..

[4]  Dae-Ik Kim,et al.  Adaptive Noise Reduction of Speech Using Wavelet Transform , 2009 .

[5]  Jae-Sung Choi Speech and Noise Recognition System by Neural Network , 2010 .

[6]  Hisham Othman,et al.  A semi-continuous state transition probability HMM-based voice activity detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[8]  Frédéric Berthommier,et al.  On a cepstrum-based speech detector robust to white noise , 2000, ArXiv.

[9]  Hadi Veisi,et al.  Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement , 2012, IET Signal Process..

[10]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[11]  Hongzhi Wang,et al.  Study on the MFCC similarity-based voice activity detection algorithm , 2011, 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC).

[12]  R. Venkatesha Prasad,et al.  Comparison of voice activity detection algorithms for VoIP , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[13]  H Othman,et al.  A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector , 2007 .

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[15]  Trieu-Kien Truong,et al.  Improved voice activity detection algorithm using wavelet and support vector machine , 2010, Comput. Speech Lang..

[16]  Yan Zhang,et al.  A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement , 2014, TheScientificWorldJournal.

[17]  Xianglong Liu,et al.  An improved noise-robust voice activity detector based on hidden semi-Markov models , 2011, Pattern Recognit. Lett..

[18]  Ji Wu,et al.  An efficient voice activity detection algorithm by combining statistical model and energy detection , 2011, EURASIP J. Adv. Signal Process..

[19]  Mohammad Hossein Moattar,et al.  A simple but efficient real-time Voice Activity Detection algorithm , 2009, 2009 17th European Signal Processing Conference.