论文信息 - Speech/non-speech discrimination combining advanced feature extraction and SVM learning

Speech/non-speech discrimination combining advanced feature extraction and SVM learning

This paper shows an effective speech/non-speech discrimination method for improving the performance of speech processing systems working in noisy environment. The proposed method uses a trained support vector machine (SVM) that defines an optimized non-linear decision rule over different sets of speech features. Two alternative feature extraction processes based on: i) subband SNR estimation after denoising, and ii) long-term SNR estimation were compared. Both methods show the ability of the SVM-based classifier to learn how the signal is masked by the acoustic noise and to define an effective non-linear decision rule. However, it is shown that a feature vector incorporating contextual information yielded better speech/non-speech discrimination even when no denoising is applied. The experimental analysis carried out on the Spanish SpeechDat-Car database shows clear improvements over standard VADs including ITU G.729, ETSI AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs. Index Terms: voice activity detection, support vector machine learning, speech enhancement.

Juan Manuel Górriz | Javier Ramírez | José C. Segura | Pablo Yélamos | Luz García

[1] Nello Cristianini,et al. Advances in Kernel Methods - Support Vector Learning , 1999 .

[2] E. Shlomot,et al. ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[3] Dong Enqing,et al. Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[4] Dong Enqing,et al. Low bit and variable rate speech coding using local cosine transform , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[6] Qiru Zhou,et al. Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[7] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[8] Birger Kollmeier,et al. Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[9] Javier Ramírez,et al. Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[10] Khalid Choukri,et al. SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[11] Juan Manuel Górriz,et al. SVM-Enabled Voice Activity Detection , 2006, ISNN.

[12] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[13] Yan Liu,et al. A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[14] Chungyong Lee,et al. Robust voice activity detection algorithm for estimating noise spectrum , 2000 .