Speech/non-speech discrimination combining advanced feature extraction and SVM learning

This paper shows an effective speech/non-speech discrimination method for improving the performance of speech processing systems working in noisy environment. The proposed method uses a trained support vector machine (SVM) that defines an optimized non-linear decision rule over different sets of speech features. Two alternative feature extraction processes based on: i) subband SNR estimation after denoising, and ii) long-term SNR estimation were compared. Both methods show the ability of the SVM-based classifier to learn how the signal is masked by the acoustic noise and to define an effective non-linear decision rule. However, it is shown that a feature vector incorporating contextual information yielded better speech/non-speech discrimination even when no denoising is applied. The experimental analysis carried out on the Spanish SpeechDat-Car database shows clear improvements over standard VADs including ITU G.729, ETSI AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs. Index Terms: voice activity detection, support vector machine learning, speech enhancement.

[1]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[2]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[3]  Dong Enqing,et al.  Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[4]  Dong Enqing,et al.  Low bit and variable rate speech coding using local cosine transform , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[6]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[9]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[10]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[11]  Juan Manuel Górriz,et al.  SVM-Enabled Voice Activity Detection , 2006, ISNN.

[12]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[13]  Yan Liu,et al.  A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[14]  Chungyong Lee,et al.  Robust voice activity detection algorithm for estimating noise spectrum , 2000 .