Voice Activity Detection Using MFCC Features and Support Vector Machine

We define voice activity detection (VAD) as a binary classification problem and solve it using the support vector machine (SVM). Challenges in SVM-based approach include selection of representative training segments, selection of features, normalization of the features, and post-processing of the frame-level decisions. We propose to construct a SVMVAD using MFCC features because they capture the most relevant information of speech, and they are widely used in speech and speaker recognition making the proposed method easy to integrate with existing applications. Practical usability is our driving motivation: the proposed SVM-VAD should be easily adapted into new conditions.

[1]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[3]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[4]  Juan Manuel Górriz,et al.  SVM-based speech endpoint detection using contextual speech features , 2006 .

[5]  Rong Tong,et al.  The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features , 2006 .

[6]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[7]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[8]  Dong Enqing,et al.  Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[11]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .