A novel voice activity detection method using energy statistical complexity

In this paper, the nonlinear dynamic characteristics of the statistical complexity were applied to the voice activity detection (VAD). By combining it with the energy feature, we present a new VAD method that is energy statistics complexity (ESC) algorithm, using fuzzy c-Means clustering algorithm and Bayesian information criterion algorithm to estimate the thresholds of the ESC characteristic, and using dual threshold method for VAD. Experiments on the TIMIT continuous speech database show that at low SNR environments, ESC method is superior to the energy spectrum entropy (ESE) method. Especially in the vehicle noise and vehicle interior noise environments, ESC method shows better detection performance.

[1]  J. Crutchfield,et al.  Measures of statistical complexity: Why? , 1998 .

[2]  K.-C. Wang,et al.  Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Kun-Ching Wang,et al.  Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy , 2008, 2008 Second International Symposium on Universal Communication.

[4]  Kai Zhao,et al.  Voice Activity Detection Based on Distance Entropy in Noisy Environment , 2009, 2009 Fifth International Joint Conference on INC, IMS and IDC.

[5]  Ricardo López-Ruiz,et al.  A Statistical Measure of Complexity , 1995, ArXiv.

[6]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Pla Information,et al.  A speech endpoint detector based on eigenspace-energy-entropy , 2003 .

[8]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[9]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[10]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[11]  Pang Quan Application of C_0 Complexity Measure in Detecting Speech , 2006 .

[12]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[13]  Ji Wu,et al.  Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Chung-Ho Yang,et al.  A novel approach to robust speech endpoint detection in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[16]  Pasi Fränti,et al.  Automatic voice activity detection in different speech applications , 2008, e-Forensics '08.

[18]  Qun Zhang,et al.  Robust Voice Activity Detection Feature Design Based on Spectral Kurtosis , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[19]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[20]  S. Casale,et al.  Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors , 2002, IEEE Signal Processing Letters.