Time-Frequency Analysis for Voice Activity Detection

This paper introduces two different ways of time-frequency representations for voice activity detection (VAD). The first method is based on the chirp-based spectral representation of the signal, while the second method is based on wavelet decomposition. Not only this is the first implementation of the Fan-Chirp Transform for VAD, but the method based on Discrete Wavelet Transform is also one of the few multidimensional approaches in the field. The paper addresses the performance of both methods with clean speech and speech in noisy conditions, and discusses their limitations.

[1]  G. Schroder,et al.  Robust voice-activity detection based on the wavelet transform , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[2]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[3]  Luis Weruaga,et al.  Self-organizing chirp-sensitive artificial auditory cortical model , 2005, INTERSPEECH.

[4]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Paul T. Brady,et al.  A statistical analysis of on-off patterns in 16 conversations , 1968 .

[6]  Tuan Van Pham,et al.  DWT-based phonetic groups classification using neural networks , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[8]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Daniel P. W. Ellis The weft: a representation for periodic sounds , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Rathinavelu Chengalvarayan,et al.  Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.

[11]  Jhing-Fa Wang,et al.  A wavelet-based voice activity detection algorithm in noisy environments , 2002, 9th International Conference on Electronics, Circuits and Systems.

[12]  Tuan Van Pham,et al.  DWT-based classification of acoustic-phonetic classes and phonetic units , 2004, INTERSPEECH.

[13]  Luis Weruaga,et al.  Adaptive chirp-based time-frequency analysis of speech signals , 2006, Speech Commun..

[14]  A. Enis Çetin,et al.  Teager energy based feature parameters for speech recognition in car noise , 1999, IEEE Signal Processing Letters.