Comparison of Speech Activity Detection Techniques for Speaker Recognition

Speech activity detection (SAD) is an essential component for a variety of speech processing applications. It has been observed that performances of various speech based tasks are very much dependent on the efficiency of the SAD. In this paper, we have systematically reviewed some popular SAD techniques and their applications in speaker recognition. Speaker verification system using different SAD technique are experimentally evaluated on NIST speech corpora using Gaussian mixture model- universal background model (GMM-UBM) based classifier for clean and noisy conditions. It has been found that two Gaussian modeling based SAD is comparatively better than other SAD techniques for different types of noises.

[1]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[2]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[3]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Zaur Nasibov Decision fusion of voice activity detectors , 2012 .

[5]  Sree Hari Krishnan Parthasarathi,et al.  Robustness of group delay representations for noisy speech signals , 2011, Int. J. Speech Technol..

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[8]  R. Venkatesha Prasad,et al.  Comparison of voice activity detection algorithms for VoIP , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[9]  Goutam Saha,et al.  Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition , 2012, Speech Commun..

[10]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[11]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..

[12]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[13]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[14]  Douglas D. O'Shaughnessy,et al.  Multitaper MFCC and PLP features for speaker verification using i-vectors , 2013, Speech Commun..

[15]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[16]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Javier Ramírez,et al.  A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[18]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[19]  Bin Ma,et al.  Speaker diarization system for RT07 and RT09 meeting room audio , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[21]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[22]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[23]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[24]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[25]  S. Casale,et al.  Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors , 2002, IEEE Signal Processing Letters.

[26]  S. R. M. Prasanna,et al.  Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  P. Kabal,et al.  Comparison of voice activity detection algorithms for wireless personal communications systems , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[28]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[29]  R. Padmanabhan STUDIES ON VOICE ACTIVITY DETECTION AND FEATURE DIVERSITY FOR SPEAKER RECOGNITION , 2012 .