Statistical Modeling of the Speech Signal

The Gaussian distribution is the most commonly used statistical model of the speech signal. In this paper we propose more general statistical model for the distributions of the real and imaginary parts of the speech signal DFT coefficients and their magnitudes. Based on experimental measurements with the TIMIT database we have shown that the Generalized Gaussian Distribution holds well across frequency and audio frame size. A Weibull distribution is proposed to model the statistical behavior of the speech signal amplitude in the frequency domain. Estimation of the distribution parameters from experimental measurements corresponds well to the distribution of the real and imaginary parts. We propose and evaluate several statistical models of various complexities. Overall these statistical models fit the actual measurements with a Jensen-Shannon divergence below 0.0012 for real and imaginary parts and below 0.003 for magnitudes. The results presented in this paper are applicable for improving speech processing algorithms based on statistical signal processing. Keywords-speech statistical model, generalized Gaussian distribution, Weibull distribution.

[1]  W. Weibull A Statistical Distribution Function of Wide Applicability , 1951 .

[2]  S. Nadarajah A generalized normal distribution , 2005 .

[3]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[4]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[5]  Philip N. Garner,et al.  Adaptive Beamforming with a Maximum Negentropy Criterion , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[6]  S. Gazor,et al.  Speech probability distribution , 2003, IEEE Signal Processing Letters.

[7]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications , 1949 .

[8]  Thomas Lotter Single- and Multi-Microphone Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005 .

[9]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[10]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[11]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[14]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  M. Varanasi,et al.  Parametric generalized Gaussian density estimation , 1989 .