DySANA: dynamic speech and noise adaptation for voice activity detection
暂无分享,去创建一个
We describe a method of simultaneusly tracking noise and speech levels for signal-to-noise ratio adaptive speech endpoint detection. The method is based on the Kalman filter framework with switching observations and uses a dynamic distribution that 1) limits the rate of change of these levels 2) enforces a range on the values for the two levels and 3) enforces a ratio between the noise and the signal levels. We call this a Lombard dynamic distribution since it encodes the expectation that a speaker will increase his or her vocal intensity in noise. The method also employs a state transition matrix which encodes a prior on the states and provides a continuity constraint. The new method provides 46.1% relative improvement in WER over a baseline GMM based endpointer at 20 dB SNR.
[1] Peder A. Olsen,et al. Dynamic Noise Adaptation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[2] Masakiyo Fujimoto,et al. Noise Robust Voice Activity Detection Based on Switching Kalman Filter , 2008, IEICE Trans. Inf. Syst..
[3] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.
[4] Kevin Murphy,et al. Switching Kalman Filters , 1998 .