Bispectra Analysis-Based VAD for Robust Speech Recognition

A robust and effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formulated (detection tests) and the domain used in this approach. Clear improvements in speech/non-speech discrimination accuracy demonstrate the effectiveness of the proposed VAD. It is shown that application of statistical detection test leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The algorithm also incorporates a previous noise reduction block improving the accuracy in detecting speech and non-speech. The experimental analysis carried out on the AURORA databases and tasks provides an extensive performance evaluation together with an exhaustive comparison to the standard VADs such as ITU G.729, GSM AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.

[1]  C. L. Nikias,et al.  Higher-order spectra analysis : a nonlinear signal processing framework , 1993 .

[2]  M. Hinich Testing for Gaussianity and Linearity of a Stationary Time Series. , 1982 .

[3]  Jitendra Tugnait Two-channel tests for common non-Gaussian signal detection , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[4]  Javier Ramírez,et al.  An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[6]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[7]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[8]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[9]  T. Rao,et al.  A TEST FOR LINEARITY OF STATIONARY TIME SERIES , 1980 .

[10]  B. Harris Spectral Analysis Of Time Series , 1967 .

[11]  Arnaud Martin,et al.  Towards improving speech detection robustness for speech recognition in adverse conditions , 2003, Speech Commun..

[12]  Rathinavelu Chengalvarayan,et al.  Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.

[13]  Régine Le Bouquin-Jeannès,et al.  Study of a voice activity detector and its influence on a noise reduction system , 1995, Speech Commun..

[14]  Jitendra K. Tugnait,et al.  Two-channel tests for common non-gaussian signal detection , 1993 .

[15]  Sharon Gannot,et al.  Speech enhancement using a mixture-maximum model , 1999, IEEE Trans. Speech Audio Process..

[16]  Peter Jax,et al.  A psychoacoustic approach to combined acoustic echo cancellation and noise reduction , 2002, IEEE Trans. Speech Audio Process..

[17]  M. Rosenblatt,et al.  ASYMPTOTIC THEORY OF ESTIMATES OF kTH-ORDER SPECTRA. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[19]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[20]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[21]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[22]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[23]  Chungyong Lee,et al.  Robust voice activity detection algorithm for estimating noise spectrum , 2000 .