Multi-band long-term signal variability features for robust voice activity detection

In this paper, we propose robust features for the problem of voice activity detection (VAD). In particular, we extend the long term signal variability (LTSV) feature to accommodate multiple spectral bands. The motivation of the multi-band approach stems from the non-uniform frequency scale of speech phonemes and noise characteristics. Our analysis shows that the multi-band approach offers advantages over the single band LTSV for voice activity detection. In terms of classification accuracy, we show 0.3%-61.2% relative improvement over the best accuracy of the baselines considered for 7 out 8 different noisy channels. Experimental results, and error analysis, are reported on the DARPA RATS corpora of noisy speech. Index Terms: noisy speech data, voice activity detection, robust feature extraction

[1]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[2]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  S.M. Ahadi,et al.  Voice Activity Detection based on Combination of Multiple Features using Linear/Kernel Discriminant Analyses , 2008, 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications.

[4]  Zdravko Kacic,et al.  A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm , 2001, INTERSPEECH.

[5]  Hema A Murthy,et al.  Voice Activity Detection using Group Delay Processing on Buffered Short-term Energy , 2007 .

[6]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[7]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  P. Fränti,et al.  Voice Activity Detection Using MFCC Features and Support Vector Machine , 2007 .

[10]  Petros Maragos,et al.  Speech event detection using multiband modulation energy , 2005, INTERSPEECH.

[11]  K. Shikano,et al.  Noise estimation using negentropy based voice-activity detector , 2004, The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS '04..

[12]  M. Gabrea,et al.  Correlation coefficient-based voice activity detector algorithm , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  Sang-Sik Ahn,et al.  Statistical Model-Based VAD Algorithm with Wavelet Transform , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[15]  Sanjit K. Mitra,et al.  Warped discrete-Fourier transform: Theory and applications , 2001 .

[16]  Spyridon Matsoukas,et al.  Developing a Speech Activity Detection System for the DARPA RATS Program , 2012, INTERSPEECH.

[17]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[18]  Kevin Walker,et al.  The RATS radio traffic collection system , 2012, Odyssey.

[19]  P. Mahalanobis On the generalized distance in statistics , 1936 .