Energy-based VAD with grey magnitude spectral subtraction

In this paper, we propose a novel voice activity detection (VAD) scheme for low SNR conditions with additive white noise. The proposed approach consists of two parts. First, a grey magnitude spectral subtraction (GMSS) is applied to remove additive noise from a given noisy speech. By this doing, an estimated clean speech is obtained. Second, the enhanced speech by the GMSS is segmented and put into an energy-based VAD to determine whether it is a speech or non-speech segment. The approach presented in this paper is called the GMSS/EVAD. Simulation results indicate that the proposed GMSS/EVAD outperforms VAD in G.729 and GSM AMR for the given low SNR examples. To investigate the performance of the GMSS/EVAD for real-life background noises, the babble and volvo noises in the NOISEX-92 database are under consideration. The simulation results for the given examples indicate that the GMSS/EVAD is able to handle appropriately for the cases of the babble noise with the SNR above 10dB and the cases of the volvo noise with SNR 15dB and up.

[1]  J. Deng,et al.  Introduction to Grey system theory , 1989 .

[2]  Donald G. Childers,et al.  Speech processing and synthesis toolboxes , 1999 .

[3]  Deng Ju-Long,et al.  Control problems of grey systems , 1982 .

[4]  Joon-Hyuk Chang,et al.  Voice activity detection based on complex Laplacian model , 2003 .

[5]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[6]  Sadegh Rezaei,et al.  A Soft Voice Activity Detection Using GARCH Filter and Variance Gamma Distribution , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[8]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[9]  Alan C. Bovik,et al.  Theory of order statistic filters and their relationship to linear FIR filters , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Javier Ramírez,et al.  An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[12]  Cheng-Hsiung Hsieh Grey Filtering and Its Application to Speech Enchancement , 2003 .

[13]  Juan Manuel Górriz,et al.  Hard C-means clustering for voice activity detection , 2006, Speech Commun..

[14]  P. Estévez,et al.  Genetic programming-based voice activity detection , 2005 .

[15]  Javier Ramírez,et al.  A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[16]  Sungkwon Park,et al.  Voice activity detection algorithm using radial basis function network , 2004 .

[17]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Juan Manuel Górriz,et al.  Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  C G Puntonet,et al.  An effective cluster-based model for robust speech detection and speech recognition in noisy environments. , 2006, The Journal of the Acoustical Society of America.

[20]  Joon-Hyuk Chang,et al.  A New Statistical Voice Activity Detection Based on UMP Test , 2007, IEEE Signal Processing Letters.

[21]  Joon-Hyuk Chang,et al.  Speech enhancement: new approaches to soft decision , 2000, INTERSPEECH.