An improved robust statistical voice activity detection based on sub-band periodic intensity

From an investigation of the statistical model likelihood ratio test-based voice activity detection(VAD), it was discovered that there existed false alarm problem in detecting the non verbal vocalization signal. In this paper, an improved statistical model-based VAD method is proposed for noise adverse environments, which employs reserved coefficient in the decision rule. The reserved coefficient is determined by sub-bands periodic intensity, sub-bands are divided on the basis of human auditory sensing characteristic. The final decision depends upon the geometric mean of the reserved sub-band likelihood ratios. Simulation which is carried out on the CADCC and NOISEX-92 databases, shows its promising performance in comparison with traditional robust VAD methods in both stationary and nonstationary noise conditions, in terms of improved false alarm rate and receiver operating characteristic (ROC) curve.

[1]  John H. L. Hansen,et al.  Discriminative Training for Multiple Observation Likelihood Ratio Based Voice Activity Detection , 2010, IEEE Signal Processing Letters.

[2]  William A. Pearlman,et al.  Source coding of the discrete Fourier transform , 1978, IEEE Trans. Inf. Theory.

[3]  Bing-Fei Wu,et al.  Voice Activity Detection Based on Auto-Correlation Function Using Wavelet Transform and Teager Energy Operator , 2006, ROCLING/IJCLCLP.

[4]  Joon-Hyuk Chang,et al.  Voice Activity Detection Based on Statistical Model Employing Deep Neural Network , 2014, 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[5]  Jianwu Dang,et al.  Voice Activity Detection Based on an Unsupervised Learning Framework , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[7]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[8]  Ramjee Prasad,et al.  Convex Combination of Multiple Statistical Models With Application to VAD , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Israel Cohen,et al.  Voice Activity Detection in Presence of Transient Noise Using Spectral Clustering , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Ahmet M. Kondoz,et al.  Improved voice activity detection based on a smoothed statistical likelihood ratio , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[13]  Jürgen Trouvain,et al.  Comparing non-verbal vocalisations in conversational speech corpora , 2012 .

[14]  John S. Collura,et al.  MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Hoirin Kim,et al.  Multiple Acoustic Model-Based Discriminative Likelihood Ratio Weighting for Voice Activity Detection , 2012, IEEE Signal Processing Letters.

[16]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[17]  Wei Jiang,et al.  Hybrid SVM/HMM architectures for statistical model-based voice activity detection , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[18]  Juan Manuel Górriz,et al.  Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.