论文信息 - Voice Activity Detection Via Noise Reducing Using Non-Negative Sparse Coding

Voice Activity Detection Via Noise Reducing Using Non-Negative Sparse Coding

This letter presents a voice activity detection (VAD) approach using non-negative sparse coding to improve the detection performance in low signal-to-noise ratio (SNR) conditions. The basic idea is to use features extracted from a noise-reduced representation of original audio signals. We decompose the magnitude spectrum of an audio signal on a speech dictionary learned from clean speech and a noise dictionary learned from noise samples. Only coefficients corresponding to the speech dictionary are considered and used as the noise-reduced representation of the signal for feature extraction. A conditional random field (CRF) is used to model the correlation between feature sequences and voice activity labels along audio signals. Then, we assign the voice activity labels for a given audio by decoding the CRF. Experimental results demonstrate that our VAD approach has a good performance in low SNR conditions.

Yunde Jia | Peng Teng

[1] Patrik O. Hoyer,et al. Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[2] Juan Manuel Górriz,et al. Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3] Bhiksha Raj,et al. Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Yoshihiko Nankaku,et al. Voice activity detection based on conditional random fields using multiple features , 2010, INTERSPEECH.

[5] Jiqing Han,et al. Sparse power spectrum based robust voice activity detector , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[7] Ji Wu,et al. Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection , 2011, IEEE Signal Processing Letters.

[8] Hoirin Kim,et al. Multiple Acoustic Model-Based Discriminative Likelihood Ratio Weighting for Voice Activity Detection , 2012, IEEE Signal Processing Letters.

[9] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[11] Ahmet M. Kondoz,et al. Improved voice activity detection based on a smoothed statistical likelihood ratio , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).