论文信息 - Energy-based VAD with grey magnitude spectral subtraction

Energy-based VAD with grey magnitude spectral subtraction

In this paper, we propose a novel voice activity detection (VAD) scheme for low SNR conditions with additive white noise. The proposed approach consists of two parts. First, a grey magnitude spectral subtraction (GMSS) is applied to remove additive noise from a given noisy speech. By this doing, an estimated clean speech is obtained. Second, the enhanced speech by the GMSS is segmented and put into an energy-based VAD to determine whether it is a speech or non-speech segment. The approach presented in this paper is called the GMSS/EVAD. Simulation results indicate that the proposed GMSS/EVAD outperforms VAD in G.729 and GSM AMR for the given low SNR examples. To investigate the performance of the GMSS/EVAD for real-life background noises, the babble and volvo noises in the NOISEX-92 database are under consideration. The simulation results for the given examples indicate that the GMSS/EVAD is able to handle appropriately for the cases of the babble noise with the SNR above 10dB and the cases of the volvo noise with SNR 15dB and up.

Cheng-Hsiung Hsieh | Po-Chin Huang | Ting-Yu Feng

[1] J. Deng,et al. Introduction to Grey system theory , 1989 .

[2] Donald G. Childers,et al. Speech processing and synthesis toolboxes , 1999 .

[3] Deng Ju-Long,et al. Control problems of grey systems , 1982 .

[4] Joon-Hyuk Chang,et al. Voice activity detection based on complex Laplacian model , 2003 .

[5] E. Shlomot,et al. ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[6] Sadegh Rezaei,et al. A Soft Voice Activity Detection Using GARCH Filter and Variance Gamma Distribution , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Sanjit K. Mitra,et al. Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[8] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[9] Alan C. Bovik,et al. Theory of order statistic filters and their relationship to linear FIR filters , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10] Javier Ramírez,et al. An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[11] Javier Ramírez,et al. Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[12] Cheng-Hsiung Hsieh. Grey Filtering and Its Application to Speech Enchancement , 2003 .

[13] Juan Manuel Górriz,et al. Hard C-means clustering for voice activity detection , 2006, Speech Commun..

[14] P. Estévez,et al. Genetic programming-based voice activity detection , 2005 .

[15] Javier Ramírez,et al. A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[16] Sungkwon Park,et al. Voice activity detection algorithm using radial basis function network , 2004 .

[17] Sven Nordholm,et al. Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Juan Manuel Górriz,et al. Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19] C G Puntonet,et al. An effective cluster-based model for robust speech detection and speech recognition in noisy environments. , 2006, The Journal of the Acoustical Society of America.

[20] Joon-Hyuk Chang,et al. A New Statistical Voice Activity Detection Based on UMP Test , 2007, IEEE Signal Processing Letters.

[21] Joon-Hyuk Chang,et al. Speech enhancement: new approaches to soft decision , 2000, INTERSPEECH.