Wavelet-based voice activity detection algorithm in variable-level noise environment

In this paper, a novel entropy-based voice activity detection (VAD) algorithm is presented in variable-level noise environment. Since the frequency energy of different types of noise focuses on different frequency subband, the effect of corrupted noise on each frequency subband is different. It is found that the seriously obscured frequency subbands have little word signal information left, and are harmful for detecting voice activity segment (VAS). First, we use bark-scale wavelet decomposition (BSWD) to split the input speech into 24 critical subbands. In order to discard the seriously corrupted frequency subband, a method of adaptive frequency subband extraction (AFSE) is then applied to only use the frequency subband. Next, we propose a measure of entropy defined on the spectrum domain of selected frequency subband to form a robust voice feature parameter. In addition, unvoiced is usually eliminated. An unvoiced detection is also integrated into the system to improve the intelligibility of voice. Experimental results show that the performance of this algorithm is superior to the G729B and other entropy-based VAD especially for variable-level background noise.

[1]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[2]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[3]  Istvan Pintér,et al.  Perceptual wavelet-representation of speech signals and its application to speech enhancement , 1996, Comput. Speech Lang..

[4]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[5]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[6]  Stéphane Mallat,et al.  Multifrequency channel decompositions of images and wavelet models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[8]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Chin-Teng Lin,et al.  Word boundary detection with mel-scale frequency bank in noisy environment , 2000, IEEE Trans. Speech Audio Process..

[10]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[11]  Leah H. Jamieson,et al.  High-quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling , 1998, IEEE Trans. Signal Process..

[12]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[13]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[14]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.