Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice ?

This paper presents an algorithm of speech endpoint detection in noisy environments, especially those with non-stationary noise. The input signal is firstly decomposed into several sub-bands. In each sub-band, an energy sequence is tracked and analyzed separately to decide whether a temporal segment is stationary or not. An algorithm of voiced speech detection based on the harmonic structure of voice is brought forward, and it is applied in the non-stationary segment to check whether it contain speech or not. The endpoints of speech are finally determined according to the combination of energy detection and voice detection. Experiments in real noise environments show that the proposed approach is more reliable compared with some standard methods.

[1]  Hisham Othman,et al.  A semi-continuous state transition probability HMM-based voice activity detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..

[3]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[4]  S. Seneff Real‐time harmonic pitch detector , 1976 .

[5]  Chung-Ho Yang,et al.  A novel approach to robust speech endpoint detection in car environments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[7]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[8]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[9]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[10]  Fathi M. Salem,et al.  An entropy based robust speech boundary detection algorithm for realistic noisy environments , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[11]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[12]  J. Lynch,et al.  Speech/Silence segmentation for real-time coding via rule based adaptive endpoint detection , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.