Gaussian process regression for voice activity detection and speech enhancement

Gaussian process (GP) model is a flexible nonparametric Bayesian method that is widely used in regression and classification. In this paper we present a probabilistic method where we solve voice activity detection (VAD) and speech enhancement in a single framework of GP regression, modeling clean speech by a GP smoother. Optimized hyperparameters in GP models lead us to a novel VAD method since learned length-scale parameters in covariance functions are much different between voiced and unvoiced frames. Clean speech is estimated by posterior means in GP models. Numerical experiments confirm the validity of our method.

[1]  Steven F. Boll A spectral subtraction algorithm for suppression of acoustic noise in speech , 1979, ICASSP.

[2]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[3]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[4]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[5]  Xuemin Shen,et al.  A dynamic system approach to speech enhancement using the H∞ filtering algorithm , 1999, IEEE Trans. Speech Audio Process..

[6]  Ehud Weinstein,et al.  Iterative and sequential Kalman filter-based speech enhancement algorithms , 1998, IEEE Trans. Speech Audio Process..

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[8]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[9]  Dirk Van Compernolle Noise adaptation in a hidden Markov model speech recognition system , 1989 .

[10]  J. Stephen Judd,et al.  Learning in neural networks , 1988, COLT '88.

[11]  Matthias W. Seeger,et al.  Gaussian Processes For Machine Learning , 2004, Int. J. Neural Syst..

[12]  Sunho Park,et al.  Rao-Blackwellized Particle Filtering for Sequential Speech Enhancement , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[13]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[17]  Simon J. Godsill,et al.  Particle methods for Bayesian modeling and enhancement of speech signals , 2002, IEEE Trans. Speech Audio Process..

[18]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[19]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .