Efficient voice activity detection algorithm using long-term spectral flatness measure

This paper proposes a novel and robust voice activity detection (VAD) algorithm utilizing long-term spectral flatness measure (LSFM) which is capable of working at 10 dB and lower signal-to-noise ratios(SNRs). This new LSFM-based VAD improves speech detection robustness in various noisy environments by employing a low-variance spectrum estimate and an adaptive threshold. The discriminative power of the new LSFM feature is shown by conducting an analysis of the speech/non-speech LSFM distributions. The proposed algorithm was evaluated under 12 types of noises (11 from NOISEX-92 and speech-shaped noise) and five types of SNR in core TIMIT test corpus. Comparisons with three modern standardized algorithms (ETSI adaptive multi-rate (AMR) options AMR1 and AMR2 and ITU-T G.729) demonstrate that our proposed LSFM-based VAD scheme achieved the best average accuracy rate. A long-term signal variability (LTSV)-based VAD scheme is also compared with our proposed method. The results show that our proposed algorithm outperforms the LTSV-based VAD scheme for most of the noises considered including difficult noises like machine gun noise and speech babble noise.

[1]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[2]  Colin H. Hansen,et al.  ENGINEERING NOISE CONTROL: Theory and Practice , 1988 .

[3]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[5]  B. G. Evans,et al.  A high quality voice coder with integrated echo canceller and voice activity detector for VSAT systems , 1993 .

[6]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[7]  E. Shlomot,et al.  A robust low complexity voice activity detection algorithm for speech communication systems , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[8]  Masahide Mizushima,et al.  Environmental noise reduction based on speech/non-speech identification for hearing aids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing , 1999 .

[10]  Giuseppe Ruggeri,et al.  A psychoacoustic auditory model to evaluate the performance of a voice activity detector , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[11]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[12]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[13]  Giuseppe Ruggeri,et al.  Performance evaluation and comparison of ITU-T/ETSI voice activity detectors , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[15]  R. Venkatesha Prasad,et al.  Comparison of voice activity detection algorithms for VoIP , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[16]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[17]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Hsiao-Chun Wu,et al.  Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise in Multimedia Communications , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[19]  Guo Ying,et al.  Auto-Correlation Property of Speech and its Application in Voice Activity Detection , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[20]  N. Madhu Note on measures for spectral flatness , 2009 .

[21]  Mark Hasegawa-Johnson,et al.  Estimation of High-Variance Vehicular Noise , 2009 .

[22]  Nozomu Hamada,et al.  Noise robust Voice Activity Detection for multiple speakers , 2010, 2010 International Symposium on Intelligent Signal Processing and Communication Systems.

[23]  Friedrich Faubel,et al.  Improving hands-free speech recognition in a car through audio-visual voice activity detection , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[24]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Chau Khoa. Pham Noise robust voice activity detection , 2013 .