Robust Voice Activity Detection Using Long-Term Signal Variability

We propose a novel long-term signal variability (LTSV) measure, which describes the degree of nonstationarity of the signal. We analyze the LTSV measure both analytically and empirically for speech and various stationary and nonstationary noises. Based on the analysis, we find that the LTSV measure can be used to discriminate noise from noisy speech signal and, hence, can be used as a potential feature for voice activity detection (VAD). We describe an LTSV-based VAD scheme and evaluate its performance under eleven types of noises and five types of signal-to-noise ratio (SNR) conditions. Comparison with standard VAD schemes demonstrates that the accuracy of the LTSV-based VAD scheme averaged over all noises and all SNRs is ~6% (absolute) better than that obtained by the best among the considered VAD schemes, namely AMR-VAD2. We also find that, at -10 dB SNR, the accuracies of VAD obtained by the proposed LTSV-based scheme and the best considered VAD scheme are 88.49% and 79.30%, respectively. This improvement in the VAD accuracy indicates the robustness of the LTSV feature for VAD at low SNR condition for most of the noises considered.

[1]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[2]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[3]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[4]  Giuseppe Ruggeri,et al.  A psychoacoustic auditory model to evaluate the performance of a voice activity detector , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Rainer Martin,et al.  A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[8]  Masahide Mizushima,et al.  Environmental noise reduction based on speech/non-speech identification for hearing aids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[10]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[11]  Farook Sattar,et al.  A new speech/non-speech classification method using minimal Walsh basis functions , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[12]  Charles W. Therrien,et al.  Probability and Random Processes for Electrical and Computer Engineers , 2011 .

[13]  Hema A Murthy,et al.  Voice Activity Detection using Group Delay Processing on Buffered Short-term Energy , 2007 .

[14]  R. Fay,et al.  Speech Processing in the Auditory System , 2010, Springer Handbook of Auditory Research.

[15]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[16]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing , 1999 .

[17]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[18]  Jerry D. Gibson,et al.  Variable rate CELP based on subband flatness , 1995, Proceedings IEEE International Conference on Communications ICC '95.

[19]  S.M. Ahadi,et al.  Voice Activity Detection based on Combination of Multiple Features using Linear/Kernel Discriminant Analyses , 2008, 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications.

[20]  Colin H. Hansen,et al.  ENGINEERING NOISE CONTROL: Theory and Practice , 1988 .

[21]  M. Gabrea,et al.  Correlation coefficient-based voice activity detector algorithm , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[22]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Hema A. Murthy,et al.  A pattern recognition approach to VAD using modified group delay , .

[24]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[25]  Sang-Sik Ahn,et al.  Statistical Model-Based VAD Algorithm with Wavelet Transform , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[26]  Joon-Hyuk Chang,et al.  Voice activity detection based on complex Laplacian model , 2003 .

[27]  Alvin M. Liberman,et al.  Speech: A Special Code , 1996 .

[28]  Petros Maragos,et al.  Speech event detection using multiband modulation energy , 2005, INTERSPEECH.

[29]  Zdravko Kacic,et al.  A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm , 2001, INTERSPEECH.

[30]  K. Shikano,et al.  Noise estimation using negentropy based voice-activity detector , 2004, The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS '04..

[31]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[32]  Dong Enqing,et al.  Low bit and variable rate speech coding using local cosine transform , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[33]  Damjan Vlaj,et al.  A Computationally Efficient Mel-Filter Bank VAD Algorithm for Distributed Speech Recognition Systems , 2005, EURASIP J. Adv. Signal Process..

[34]  John S. D. Mason,et al.  A voice activity detector based on cepstral analysis , 1993, EUROSPEECH.

[35]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.