Stochastic coding of speech signals at very low bit rates: The importance of speech perception

Abstract We describe a new stochastic model for generating speech signals suitable for coding at low bit rates. In this model, the speech waveform is represented as a zero mean Gaussian process with slowly-varying power spectrum. The optimum innovation sequence is obtained by minimizing a subjective error criterion based on properties of human auditory perception. Each block of 40 samples (representing 5 ms of the speech signal sampled at 8 kHz) of the innovation signal is coded into one out of 1024 randomly generated Gaussian sequences of length 40. The chosen sequence minimizes a spectrally weighted error criterion. The innovation signal is thus encoded at 2 kbits/s. A time-varying linear filter whose parameters are determined directly from the speech signal is used to produce the desired power spectrum. Even at this low bit rate the resynthesized speech is barely distinguishable from the original.