Robust voice activity detection based on the entropy of noise-suppressed spectrum

A novel noise robust voice activity detection approach is introduced. The novelty of the method that it uses noise suppressed spectrum of the input signal for spectral entropy calculation. As a result excellent end-pointing performance is observed based on predefined global entropy threshold and time constraints. The effect of frame dropping controlled by the proposed algorithm was investigated on the accuracy of automatic speech recognition. The experiments were performed on Hungarian publicly available noisy and normal telephony speech databases. The relative improvement due to dropping of non-speech frames was positive in all test configurations with a maximum of 29,5%. Besides, in average more than 50% of the frames were dropped.

[1]  Xu Bo,et al.  AN IMPROVED ENTROPY-BASED ENDPOINT DETECTION ALGORITHM , 2002 .

[2]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[3]  Izhak Shafran,et al.  Robust speech detection and segmentation for real-time ASR applications , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Silvio Montrésor,et al.  Speech signal detection in noisy environement using a local entropic criterion , 1997, EUROSPEECH.

[5]  Chung-Ho Yang,et al.  A novel approach to robust speech endpoint detection in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.