Robust voice activity detection directed by noise classification

In this paper voice activity detection (VAD) is formulated as a two-class classification problem using support vector machines (SVM). The proposed method combines a noise robust speech processing feature extraction process together with SVM models trained in different background noises for speech/non-speech classification. A multi-class SVM is also used to classify background noises in order to select SVM model for VAD. The proposed VAD is tested with TIMIT data artificially distorted by different additive noise types and is compared with state-of-the-art VADs. Experimental results show that the proposed VAD can extract speech activity under poor SNR conditions, and it is also insensitive to variable levels of noise.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[3]  Arnaud Martin,et al.  Towards improving speech detection robustness for speech recognition in adverse conditions , 2003, Speech Commun..

[4]  Javier Ramírez,et al.  An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Bing-Fei Wu,et al.  Voice Activity Detection Based on Auto-Correlation Function Using Wavelet Transform and Teager Energy Operator , 2006, ROCLING/IJCLCLP.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Mohsen Rahmani,et al.  A wavelet based speech enhancement method using noise classification and shaping , 2008, INTERSPEECH.

[10]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[11]  Trieu-Kien Truong,et al.  Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator , 2007, Pattern Recognit. Lett..

[12]  K. Srinivasan,et al.  Voice activity detection for cellular networks , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[13]  Douglas B. Williams,et al.  Digital signal processing handbook : CRCnetBASE 1999 , 1999 .

[14]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[15]  Vijay K. Madisetti,et al.  The Digital Signal Processing Handbook , 1997 .

[16]  Chai Wutiwiwatchai,et al.  Robust Speech Recognition Using PCA-Based Noise Classification , 1970 .

[17]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[18]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[19]  Juan Manuel Górriz,et al.  Speech/non-speech discrimination combining advanced feature extraction and SVM learning , 2006, INTERSPEECH.

[20]  Chungyong Lee,et al.  Robust voice activity detection algorithm for estimating noise spectrum , 2000 .

[21]  P. Fränti,et al.  Voice Activity Detection Using MFCC Features and Support Vector Machine , 2007 .

[22]  Giuseppe Ruggeri,et al.  Performance evaluation and comparison of ITU-T/ETSI voice activity detectors , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).