Noise-Robust Voice Activity Detector Based on Four States-Based HMM

Voice activity detection (VAD) is more and more essential in the noisy environments to provide an accuracy performance in the speech recognition. In this paper, we provide a method based on left-right hidden Markov model (HMM) to identify the start and end of the speech. The method builds two models of non-speech and speech instead of existed two states, formally, each model could include several states, we also analysis other features, such as pitch index, pitch magnitude and fractal dimension of speech and non-speech.. We compare the VAD results with the proposed algorithm and two states HMM. Experiments show that the proposed method make a better performance than two states HMMs in VAD, especially in the low signal-to-noise ratio (SNR) environment.

[1]  Joon-Hyuk Chang,et al.  Voice activity detection based on a family of parametric distributions , 2007, Pattern Recognit. Lett..

[2]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[3]  Miguel Toro,et al.  Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[4]  Steve McLaughlin,et al.  Is speech chaotic?: invariant geometrical measures for speech data , 1994 .

[5]  Farshad Almasganj,et al.  A two-stage speech activity detection system considering fractal aspects of prosody , 2010, Pattern Recognit. Lett..

[6]  Brian Litt,et al.  A comparison of waveform fractal dimension algorithms , 2001 .

[7]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[8]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[9]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Zheng Pei,et al.  A New Method Based on HMMs and K-Means Algorithms for Noise-Robust Voice Activity Detector , 2011 .

[11]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Nima Mesgarani,et al.  Speech enhancement based on filtering the spectrotemporal modulations , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  P. Fränti,et al.  Voice Activity Detection Using MFCC Features and Support Vector Machine , 2007 .

[16]  Iasonas Kokkinos,et al.  Nonlinear speech analysis using models for chaotic systems , 2005, IEEE Transactions on Speech and Audio Processing.

[17]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[18]  Xianglong Liu,et al.  An improved noise-robust voice activity detector based on hidden semi-Markov models , 2011, Pattern Recognit. Lett..