A maximum log-likelihood approach to voice activity detection

Modern voice activity detection (VAD) algorithms must achieve reliable operation at low signal-to-noise ratios (SNR). Although a lot of research has been performed to solve this issue, the operation of existing VAD algorithms is still far away from ideal. In this paper, we present a novel VAD algorithm, in which we apply the Teager energy cepstral coefficients, to obtain a noise robust feature extraction method, together with Gaussian mixture models that serve for the classification of speech and silence periods. In the suggested solution, the threshold method used in many noise robust VAD algorithms is eliminated, thus favoring its use in real applications. The performance of this novel algorithm was tested under known and unknown noise statistics, and compared to a statistical model-based approach found in literature. The results obtained show that the proposed solution achieves better accuracy and significantly reduces clipping of speech periods; thus achieving superior signal quality.

[1]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[2]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[3]  C G Puntonet,et al.  An effective cluster-based model for robust speech detection and speech recognition in noisy environments. , 2006, The Journal of the Acoustical Society of America.

[4]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[5]  Petros Maragos,et al.  Robust AM-FM features for speech recognition , 2005, IEEE Signal Processing Letters.

[6]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[7]  Zhang Liang,et al.  Voice activity detection algorithm improvement in adaptive multi-rate speech coding of 3GPP , 2005, Proceedings. 2005 International Conference on Wireless Communications, Networking and Mobile Computing, 2005..

[8]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[9]  Petros Maragos,et al.  AM-FM energy detection and separation in noise using multiband energy operators , 1993, IEEE Trans. Signal Process..

[10]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[11]  Akinori Kawamura,et al.  Robust Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Dong Enqing,et al.  Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[13]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[14]  Douglas D. O'Shaughnessy,et al.  Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  A. Enis Çetin,et al.  Teager energy based feature parameters for speech recognition in car noise , 1999, IEEE Signal Processing Letters.

[16]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[17]  Petros Maragos,et al.  Auditory Teager energy cepstrum coefficients for robust speech recognition , 2005, INTERSPEECH.

[18]  O. Viikki,et al.  ASR in portable wireless devices , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[19]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[20]  S. Masud,et al.  Support Vector Machine based Voice Activity Detection , 2006, 2006 International Symposium on Intelligent Signal Processing and Communications.

[21]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).