Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy

This letter presents a robust voice activity detection (VAD) algorithm for detecting voice activity in noisy environments. The presented robust VAD utilizes the entropy measurement defined in band-splitting spectrum domain to exploit the formant frequency representation as a highly efficient, compact representation of the time-varying characteristics of speech. Additionally, Teager energy operator (TEO) can be employed to provide a better representation of formant information resulting in high performance of classification of speech/non-speech priori to entropy-based measurement. The results show that the proposed algorithm has an overall better performance than the standard ITU-T G.729B VAD and Shen's entropy-based VAD.

[1]  K.-C. Wang,et al.  Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  A. Enis Çetin,et al.  Teager energy based feature parameters for speech recognition in car noise , 1999, IEEE Signal Processing Letters.

[5]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.