Bispectrum Estimators for Voice Activity Detection and Speech Recognition

A new Bispectra Analysis application is presented is this paper. A set of bispectrum estimators for robust and effective voice activity detection (VAD) algorithm are proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formulated (detection tests) and the domain used in this approach. Clear improvements in speech/non-speech discrimination accuracy demonstrate the effectiveness of the proposed VAD. It is shown that application of statistical detection test leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The algorithm also incorporates a previous noise reduction block improving the accuracy in detecting speech and non-speech. The experimental analysis carried out on the AURORA databases and tasks provides an extensive performance evaluation together with an exhaustive comparison to the standard VADs such as ITU G.729, GSM AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.

[1]  M. Rosenblatt,et al.  ASYMPTOTIC THEORY OF ESTIMATES OF kTH-ORDER SPECTRA. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[2]  B. Harris Spectral Analysis Of Time Series , 1967 .

[3]  L. H. Koopmans The spectral analysis of time series , 1974 .

[4]  T. Rao,et al.  A TEST FOR LINEARITY OF STATIONARY TIME SERIES , 1980 .

[5]  M. Hinich Testing for Gaussianity and Linearity of a Stationary Time Series. , 1982 .

[6]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[7]  Jitendra K. Tugnait,et al.  Two-channel tests for common non-gaussian signal detection , 1993 .

[8]  Jitendra Tugnait Two-channel tests for common non-Gaussian signal detection , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[9]  C. L. Nikias,et al.  Higher-order spectra analysis : a nonlinear signal processing framework , 1993 .

[10]  Jitendra K. Tugnait Detection of non-Gaussian signals using integrated polyspectrum , 1994, IEEE Trans. Signal Process..

[11]  Régine Le Bouquin-Jeannès,et al.  Study of a voice activity detector and its influence on a noise reduction system , 1995, Speech Commun..

[12]  Steve Young,et al.  The HTK book , 1995 .

[13]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[14]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[15]  Rathinavelu Chengalvarayan,et al.  Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.

[16]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[17]  Chungyong Lee,et al.  Robust voice activity detection algorithm for estimating noise spectrum , 2000 .

[18]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[19]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[20]  D. Brillinger 6. Analysis of a Linear Time Invariant Relation between a Stochastic Series and Several Deterministic Series , 2001 .

[21]  Sharon Gannot,et al.  Speech enhancement using a mixture-maximum model , 1999, IEEE Trans. Speech Audio Process..

[22]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[23]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[24]  Peter Jax,et al.  A psychoacoustic approach to combined acoustic echo cancellation and noise reduction , 2002, IEEE Trans. Speech Audio Process..

[25]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[26]  Arnaud Martin,et al.  Towards improving speech detection robustness for speech recognition in adverse conditions , 2003, Speech Commun..

[27]  Juan Manuel Górriz,et al.  Bispectra Analysis-Based VAD for Robust Speech Recognition , 2005, IWINAC.

[28]  Javier Ramírez,et al.  An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[29]  J. C. Segura,et al.  Improved MO-LRT VAD based on bispectra Gaussian model , 2005 .