Robust Speech Endpoint Detection Based on Improved Adaptive Band-Partitioning Spectral Entropy

The performance of speech recognition system is often degraded in adverse environments. Accurate Speech endpoint detection is very important for robust speech recognition. In this paper, an improved adaptive band-partitioning spectral entropy algorithm was proposed for speech endpoint detection, which utilized the weighted power spectral subtraction to boost up the signal-to-noise ratio (SNR) as well as keep the robustness. The idea of adaptive band-partitioning spectral entropy is to divide a frame into some sub-bands which the number of it could be selected adaptively, and calculate spectral entropy of them. Although it has good robustness, the accuracy degrades rapidly when the SNR are low. Therefore, the weighted power spectral subtraction is presented for reducing the spectral effects of acoustically added noise in speech. The speech recognition experiment results indicate that the recognition accuracy have improved well in adverse environments.

[1]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[2]  Jiang Ze-jia Simulation of Speech Endpoint Detection , 2005 .

[3]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[4]  Rathinavelu Chengalvarayan,et al.  Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.

[5]  Mohammad Hasan Savoji,et al.  A robust algorithm for accurate endpointing of speech signals , 1989, Speech Commun..

[6]  Akinori Kawamura,et al.  Robust Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Dong Enqing,et al.  Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[8]  Jean-Claude Junqua,et al.  A robust algorithm for word boundary detection in the presence of noise , 1994, IEEE Trans. Speech Audio Process..

[9]  Bobby R. Hunt,et al.  Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[10]  Hermann Ney,et al.  An optimization algorithm for determining the endpoints of isolated utterances , 1981, ICASSP.

[11]  Chung-Ho Yang,et al.  A novel approach to robust speech endpoint detection in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[13]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[14]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[15]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[16]  Chin-Teng Lin,et al.  Word boundary detection with mel-scale frequency bank in noisy environment , 2000, IEEE Trans. Speech Audio Process..

[17]  Leonard Webster,et al.  Comparison of energy-based endpoint detectors for speech signal processing , 1996, Proceedings of SOUTHEASTCON '96.

[18]  K.-C. Wang,et al.  Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments , 2005, IEEE Transactions on Speech and Audio Processing.