Robust Voice Activity Detection Algorithm based on Long Term Dominant Frequency and Spectral Flatness Measure

In this paper, a robust voice activity detection algorithm based on a long-term metric using dominant frequency and spectral flatness measure is proposed. The propose algorithm makes use of the discriminating power of both features to derive the decision rule. This method reduces the average number of speech detection errors. We evaluate its performance using 15 additive noises at different SNRs (-10 dB to 10 dB) and compared with some of the most recent standard algorithms. Experiments show that our propose algorithm achieves the best performance in terms of accuracy rate average over all SNRs and noises.

[1]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[2]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[3]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[5]  E. Shlomot,et al.  A robust low complexity voice activity detection algorithm for speech communication systems , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[6]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[7]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Chin-Teng Lin,et al.  Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure , 2001, IEEE Trans. Speech Audio Process..

[10]  Dong Enqing,et al.  Low bit and variable rate speech coding using local cosine transform , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[11]  Dong Enqing,et al.  Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[12]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[13]  P. Estévez,et al.  Genetic programming-based voice activity detection , 2005 .

[14]  Peder A. Olsen,et al.  Voicing features for robust speech detection , 2005, INTERSPEECH.

[15]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Tuan Van Pham,et al.  Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions , 2009, 2009 IEEE-RIVF International Conference on Computing and Communication Technologies.

[18]  Fernando S. Schlindwein,et al.  Comparison of computation time for estimation of dominant frequency of atrial electrograms: Fast fourier transform, blackman tukey, autoregressive and multiple signal classification , 2010 .

[19]  Juan Manuel Górriz,et al.  Improved likelihood ratio test based voice activity detector applied to speech recognition , 2010, Speech Commun..

[20]  Masafumi Nishimura,et al.  Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.

[21]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Akinori Nishihara,et al.  Efficient voice activity detection algorithm using long-term spectral flatness measure , 2013, EURASIP J. Audio Speech Music. Process..

[23]  Themos Stafylakis,et al.  Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus , 2014, Odyssey.

[24]  Cassia Valentini-Botinhao,et al.  Noisy speech database for training speech enhancement algorithms and TTS models , 2017 .