An efficient voice activity detection algorithm by combining statistical model and energy detection

In this article, we present a new voice activity detection (VAD) algorithm that is based on statistical models and empirical rule-based energy detection algorithm. Specifically, it needs two steps to separate speech segments from background noise. For the first step, the VAD detects possible speech endpoints efficiently using the empirical rule-based energy detection algorithm. However, the possible endpoints are not accurate enough when the signal-to-noise ratio is low. Therefore, for the second step, we propose a new gaussian mixture model-based multiple-observation log likelihood ratio algorithm to align the endpoints to their optimal positions. Several experiments are conducted to evaluate the proposed VAD on both accuracy and efficiency. The results show that it could achieve better performance than the six referenced VADs in various noise scenarios.

[1]  Tuan Van Pham,et al.  Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions , 2009, 2009 IEEE-RIVF International Conference on Computing and Communication Technologies.

[2]  Petros Maragos,et al.  Multiband Modulation Energy Tracking for Noisy Speech Detection , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[4]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[5]  Juan Manuel Górriz,et al.  Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..

[8]  Joon-Hyuk Chang,et al.  Statistical model-based voice activity detection using support vector machine , 2009 .

[9]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[10]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[11]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[12]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[13]  Leah H. Jamieson,et al.  Endpoint detection of isolated utterances based on a modified Teager energy measurement , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[15]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[16]  John H. L. Hansen,et al.  Discriminative Training for Multiple Observation Likelihood Ratio Based Voice Activity Detection , 2010, IEEE Signal Processing Letters.

[17]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[18]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[19]  Rathinavelu Chengalvarayan,et al.  Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.

[20]  Régine Le Bouquin-Jeannès,et al.  Study of a voice activity detector and its influence on a noise reduction system , 1995, Speech Commun..

[21]  M.N.S. Swamy,et al.  An improved voice activity detection using higher order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[22]  K.-C. Wang,et al.  Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  Juan Manuel Górriz,et al.  SVM-based speech endpoint detection using contextual speech features , 2006 .

[24]  Chung-Ho Yang,et al.  A novel approach to robust speech endpoint detection in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[25]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[26]  Joon-Hyuk Chang,et al.  Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..

[27]  Sang-Ick Kang,et al.  Discriminative Weight Training for a Statistical Model-Based Voice Activity Detection , 2008, IEEE Signal Processing Letters.

[28]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[29]  Sadegh Rezaei,et al.  A Soft Voice Activity Detection Using GARCH Filter and Variance Gamma Distribution , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Shingo Kuroiwa,et al.  Robust speech detection method for telephone speech recognition system , 1999, Speech Commun..

[31]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[32]  J. G. Wilpon,et al.  An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints , 1984, AT&T Bell Laboratories Technical Journal.

[33]  Joon-Hyuk Chang,et al.  A New Statistical Voice Activity Detection Based on UMP Test , 2007, IEEE Signal Processing Letters.

[34]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .