Change Point Detection in GARCH Models for Voice Activity Detection

This paper presents a robust algorithm for voice activity detection (VAD) based on change point detection in a generalized autoregressive conditional heteroscedasticity (GARCH) process. GARCH models are new statistical methods that are used especially in economic time series and are a popular choice to model speech signals and their changing variances. Change point detection is also important in economic sciences. In this paper, no distinct probability functions are assumed for speech and noise distributions. Also, to detect speech/nonspeech intervals, no likelihood ratio test (LRT) is employed. For testing parameter constancy in GARCH models, the algorithm of the Cramer-von Mises (CVM) test is described. This test is a nonparametric test and is based on the empirical quantiles. We show that VAD is related to the parameter constancy test in GARCH process, and we illustrate several examples.

[1]  R. Golanski Study on the dynamic range of delta modulations with time-varying sampling periods , 2004, IEEE Signal Processing Letters.

[2]  P. A. Barrett,et al.  Robust noise detection for speech detection and enhancement , 1997 .

[3]  Ahmet M. Kondoz,et al.  Improved voice activity detection based on a smoothed statistical likelihood ratio , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[5]  Jean-Claude Junqua,et al.  A robust algorithm for word boundary detection in the presence of noise , 1994, IEEE Trans. Speech Audio Process..

[6]  Marianne Frisén,et al.  Properties and Use of the Shewhart Method and Its Followers , 2007 .

[7]  I. Berkes,et al.  The efficiency of the estimators of the parameters in GARCH processes , 2004, math/0406432.

[8]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Javier Ramírez,et al.  A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[10]  E. Ghysels,et al.  Detecting Multiple Breaks in Financial Market Volatility Dynamics , 2002 .

[11]  O. Tanrikulu,et al.  Critically sampled sub-band acoustic echo cancellers based on IIR and FIR filter banks , 1997 .

[12]  J. Kiefer,et al.  DISTRIBUTION FREE TESTS OF INDEPENDENCE BASED ON THE SAMPLE DISTRIBUTION FUNCTION , 1961 .

[13]  Bruce E. Hansen,et al.  Asymptotic Theory for the Garch(1,1) Quasi-Maximum Likelihood Estimator , 1994, Econometric Theory.

[14]  G. Martynov Statistical tests based on empirical processes and related questions , 1992 .

[15]  P. Bougerol,et al.  Strict Stationarity of Generalized Autoregressive Processes , 1992 .

[16]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[17]  Sadegh Rezaei,et al.  A Soft Voice Activity Detection Using GARCH Filter and Variance Gamma Distribution , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Jae-Young Kim Detection of change in persistence of a linear time series , 2000 .

[19]  Javier Ramírez,et al.  Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[20]  Hamidreza Amindavar,et al.  GARCH coefficients as feature for speech recognition in Persian isolated digit , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  L. Horváth,et al.  Limit Theorems in Change-Point Analysis , 1997 .

[22]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[23]  Saeed Gazor,et al.  An adaptive KLT approach for speech enhancement , 2001, IEEE Trans. Speech Audio Process..

[24]  S. Gazor,et al.  Speech probability distribution , 2003, IEEE Signal Processing Letters.

[25]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..

[26]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Richard Heaney,et al.  Change in unconditional foreign exchange rate volatility: an analysis of the GBP and USD price of the Euro from 2002 to 2003 , 2005 .

[28]  Piotr Kokoszka,et al.  Change Point Detection Based on Empirical Quantiles , 2002 .

[29]  Israel Cohen,et al.  Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models , 2006, Signal Process..

[30]  Anthony G. Constantinides,et al.  Residual echo signal in critically sampled subband acoustic echo cancellers based on IIR and FIR filter banks , 1997, IEEE Trans. Signal Process..

[31]  Derek S. Cotterill,et al.  ON THE LIMITING DISTRIBUTION OF AND CRITICAL VALUES FOR THE HOEFFDING, BLUM, KIEFER, ROSENBLATT INDEPENDENCE CRITERION , 1985 .

[32]  Peter Romilly,et al.  Time series modelling of global mean temperature for managerial decision-making. , 2005, Journal of environmental management.

[33]  Lajos Horváth,et al.  NONPARAMETRIC TESTS FOR THE CHANGEPOINT PROBLEM , 1987 .

[34]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[35]  Israel Cohen Modeling speech signals in the time-frequency domain using GARCH , 2004, Signal Process..

[36]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[37]  P. Bougerol,et al.  Stationarity of Garch processes and of some nonnegative time series , 1992 .

[38]  D. Darling The Kolmogorov-Smirnov, Cramer-von Mises Tests , 1957 .

[39]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..