Voice Activity Detection Via Noise Reducing Using Non-Negative Sparse Coding

This letter presents a voice activity detection (VAD) approach using non-negative sparse coding to improve the detection performance in low signal-to-noise ratio (SNR) conditions. The basic idea is to use features extracted from a noise-reduced representation of original audio signals. We decompose the magnitude spectrum of an audio signal on a speech dictionary learned from clean speech and a noise dictionary learned from noise samples. Only coefficients corresponding to the speech dictionary are considered and used as the noise-reduced representation of the signal for feature extraction. A conditional random field (CRF) is used to model the correlation between feature sequences and voice activity labels along audio signals. Then, we assign the voice activity labels for a given audio by decoding the CRF. Experimental results demonstrate that our VAD approach has a good performance in low SNR conditions.

[1]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[2]  Juan Manuel Górriz,et al.  Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Yoshihiko Nankaku,et al.  Voice activity detection based on conditional random fields using multiple features , 2010, INTERSPEECH.

[5]  Jiqing Han,et al.  Sparse power spectrum based robust voice activity detector , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[7]  Ji Wu,et al.  Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection , 2011, IEEE Signal Processing Letters.

[8]  Hoirin Kim,et al.  Multiple Acoustic Model-Based Discriminative Likelihood Ratio Weighting for Voice Activity Detection , 2012, IEEE Signal Processing Letters.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[11]  Ahmet M. Kondoz,et al.  Improved voice activity detection based on a smoothed statistical likelihood ratio , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).