We propose a model for the generation of speech signals based on the stochastic properties of the speech signal. It is shown that the speech signal is the multiplication of a Gaussian random process (RP) by a slowly time-varying Rayleigh RP. This assumption is justified since it results in a spherically invariant random process (SIRP) with a Gaussian distribution in short intervals and a Laplacian distribution for long intervals. This result is justified by studying the probability distribution function (PDF) of the estimated power spectrum density (PSD) of the speech signal using linear predictive coding (LPC) for several segmentation lengths. Our experiments show that the PDF of the estimated PSD is well approximated by a Rayleigh distribution around the formant frequencies and by a Gaussian distribution in frequencies far from the formant frequencies.
[1]
Ronald W. Schafer,et al.
Digital Processing of Speech Signals
,
1978
.
[2]
S. Gazor,et al.
Speech probability distribution
,
2003,
IEEE Signal Processing Letters.
[3]
W. Davenport.
An Experimental Study of Speech‐Wave Probability Distributions
,
1952
.
[4]
D. L. Richards,et al.
Statistical properties of speech signals
,
1964
.
[5]
Jerry D. Gibson,et al.
Digital coding of waveforms: Principles and applications to speech and video
,
1985,
Proceedings of the IEEE.
[6]
H. Brehm,et al.
Description and generation of spherically invariant speech-model signals
,
1987
.