论文信息 - Efficient voice activity detection algorithm using long-term spectral flatness measure

Efficient voice activity detection algorithm using long-term spectral flatness measure

This paper proposes a novel and robust voice activity detection (VAD) algorithm utilizing long-term spectral flatness measure (LSFM) which is capable of working at 10 dB and lower signal-to-noise ratios(SNRs). This new LSFM-based VAD improves speech detection robustness in various noisy environments by employing a low-variance spectrum estimate and an adaptive threshold. The discriminative power of the new LSFM feature is shown by conducting an analysis of the speech/non-speech LSFM distributions. The proposed algorithm was evaluated under 12 types of noises (11 from NOISEX-92 and speech-shaped noise) and five types of SNR in core TIMIT test corpus. Comparisons with three modern standardized algorithms (ETSI adaptive multi-rate (AMR) options AMR1 and AMR2 and ITU-T G.729) demonstrate that our proposed LSFM-based VAD scheme achieved the best average accuracy rate. A long-term signal variability (LTSV)-based VAD scheme is also compared with our proposed method. The results show that our proposed algorithm outperforms the LTSV-based VAD scheme for most of the noises considered including difficult noises like machine gun noise and speech babble noise.

Akinori Nishihara | Yanna Ma | Yanna Ma | A. Nishihara

[1] Lawrence R. Rabiner,et al. An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[2] Colin H. Hansen,et al. ENGINEERING NOISE CONTROL: Theory and Practice , 1988 .

[3] I. Boyd,et al. The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[4] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[5] B. G. Evans,et al. A high quality voice coder with integrated echo canceller and voice activity detector for VSAT systems , 1993 .

[6] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[7] E. Shlomot,et al. A robust low complexity voice activity detection algorithm for speech communication systems , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[8] Masahide Mizushima,et al. Environmental noise reduction based on speech/non-speech identification for hearing aids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Dimitris G. Manolakis,et al. Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing , 1999 .

[10] Giuseppe Ruggeri,et al. A psychoacoustic auditory model to evaluate the performance of a voice activity detector , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[11] Andrzej Drygajlo,et al. Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[12] Rafik A. Goubran,et al. Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[13] Giuseppe Ruggeri,et al. Performance evaluation and comparison of ITU-T/ETSI voice activity detectors , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14] Birger Kollmeier,et al. Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[15] R. Venkatesha Prasad,et al. Comparison of voice activity detection algorithms for VoIP , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[16] Javier Ramírez,et al. Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[17] Sven Nordholm,et al. Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Hsiao-Chun Wu,et al. Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise in Multimedia Communications , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[19] Guo Ying,et al. Auto-Correlation Property of Speech and its Application in Voice Activity Detection , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[20] N. Madhu. Note on measures for spectral flatness , 2009 .

[21] Mark Hasegawa-Johnson,et al. Estimation of High-Variance Vehicular Noise , 2009 .

[22] Nozomu Hamada,et al. Noise robust Voice Activity Detection for multiple speakers , 2010, 2010 International Symposium on Intelligent Signal Processing and Communication Systems.

[23] Friedrich Faubel,et al. Improving hands-free speech recognition in a car through audio-visual voice activity detection , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[24] Shrikanth S. Narayanan,et al. Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25] Chau Khoa. Pham. Noise robust voice activity detection , 2013 .