论文信息 - Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability

Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability

Voice activity detection (VAD) is widely used for various speech-based systems which is an important pre-processing step. This paper proposes a robust voice activity detection algorithm. In the proposed algorithm, the sub-band temporal envelope and the sub-band long-term signal variability are considered to distinguish the speech from all kinds of non-speech which include stationary noise and non-stationary noise. The two features are combined to make a robust VAD decision according to the fusion decision. The proposed algorithm also is an unsupervised low-complexity algorithm and can operate without pre-train models. The experiments results show that the proposed algorithm is prior to the different baseline algorithms and can handle a variety of noise environments over a wide range of signal-to-noise ratios. The proposed algorithm could apply to speech-based systems.

[1] John S. Collura,et al. MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] I. Johansson,et al. The adaptive multi-rate speech coder , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[3] Jr. S. Marple,et al. Computing the discrete-time 'analytic' signal via FFT , 1999, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[4] Shrikanth S. Narayanan,et al. Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[6] Andreas Stolcke,et al. Multispeaker speech activity detection for the ICSI meeting recorder , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7] M. Gabrea,et al. Correlation coefficient-based voice activity detector algorithm , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[8] Spyridon Matsoukas,et al. Developing a Speech Activity Detection System for the DARPA RATS Program , 2012, INTERSPEECH.

[9] Bowon Lee. MINIMUM MEAN-SQUARED ERROR A POSTERIORI ESTIMATION OF HIGH VARIANCE VEHICULAR NOISE , .

[10] Chungyong Lee,et al. Robust voice activity detection algorithm for estimating noise spectrum , 2000 .

[11] Andrzej Drygajlo,et al. Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[12] Yoshihiko Nankaku,et al. Voice activity detection based on conditional random fields using multiple features , 2010, INTERSPEECH.

[13] G. Clark,et al. Reference , 2008 .

[14] Ananya Misra,et al. Speech/Nonspeech Segmentation in Web Videos , 2012, INTERSPEECH.

[15] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[16] E. Shlomot,et al. ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[17] Sang-Sik Ahn,et al. Statistical Model-Based VAD Algorithm with Wavelet Transform , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[18] Björn W. Schuller,et al. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19] K. Shikano,et al. Noise estimation using negentropy based voice-activity detector , 2004, The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS '04..

[20] S. Casale,et al. Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors , 2002, IEEE Signal Processing Letters.

[21] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[22] Douglas A. Reynolds,et al. An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23] Narimene Lezzoum,et al. A low-complexity voice activity detector for smart hearing protection of hyperacusic persons , 2013, INTERSPEECH.

[24] Nima Mesgarani,et al. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25] Xianglong Liu,et al. An improved noise-robust voice activity detector based on hidden semi-Markov models , 2011, Pattern Recognit. Lett..