Speech/non-speech discrimination based on contextual information integrated bispectrum LRT

This letter shows an effective statistical voice activity detection algorithm based on the integrated bispectrum, which is defined as a cross spectrum between the signal and its square and inherits the ability of higher order statistics to detect signals in noise with many other additional advantages: 1) its computation as a cross spectrum leads to significant computational savings, and 2) the variance of the estimator is of the same order as that of the power spectrum estimator. The decision rule is formulated in terms of an average likelihood ratio test (LRT) involving successive integrated bispectrum speech features. With these and other innovations, the proposed method reports significant improvements in speech/pause discrimination as well as in speech recognition over standardized techniques such as ITU-T G.729, ETSI AMR, and AFE VADs, and over recently published VADs

[1]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[2]  Jitendra K. Tugnait Corrections to "Detection of Non-Gaussian Signals Using Integrated Polyspectrum" , 1995, IEEE Trans. Signal Process..

[3]  Arnaud Martin,et al.  Towards improving speech detection robustness for speech recognition in adverse conditions , 2003, Speech Commun..

[4]  B. Harris Spectral Analysis Of Time Series , 1967 .

[5]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[6]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[7]  Zheng Bao,et al.  A new feature vector using selected bispectra for signal classification with application in radar target recognition , 2001, IEEE Trans. Signal Process..

[8]  Chungyong Lee,et al.  Robust voice activity detection algorithm for estimating noise spectrum , 2000 .

[9]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[10]  Kumar Swaminathan,et al.  Noise reduction and echo cancellation front-end for speech codecs , 2003, IEEE Trans. Speech Audio Process..

[11]  BenyassineA.,et al.  ITU-T Recommendation G.729 Annex B , 1997 .

[12]  David R. Brillinger,et al.  Time Series: Data Analysis and Theory. , 1982 .

[13]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[14]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[15]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[16]  Zheng Bao,et al.  Circularly integrated bispectra: novel shift invariant features for high-resolution radar target recognition , 1998 .

[17]  Jitendra K. Tugnait,et al.  Detection of non-Gaussian signals using integrated polyspectrum , 1993, Optics & Photonics.

[18]  Wei-Ping Zhu,et al.  Improved voice activity detection via contextual information and noise suppression , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[19]  Jitendra K. Tugnait Detection of non-Gaussian signals using integrated polyspectrum , 1994, IEEE Trans. Signal Process..

[20]  Régine Le Bouquin-Jeannès,et al.  Study of a voice activity detector and its influence on a noise reduction system , 1995, Speech Commun..

[21]  J. C. Segura,et al.  Improved MO-LRT VAD based on bispectra Gaussian model , 2005 .

[22]  Javier Ramírez,et al.  An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  Javier Ramírez,et al.  A new adaptive long-term spectral estimation voice activity detector , 2003, INTERSPEECH.

[24]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..

[25]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[26]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[27]  Maurizio Omologo,et al.  Use of a CSP-based voice activity detector for distant-talking ASR , 2003, INTERSPEECH.