Parametric models of the magnitude/phase spectrum for harmonic speech coding

A method is described for representing magnitude and phase in a sinusoidal transform coder. Instead of transmitting individual sinusoids, the entire speech spectrum is transmitted. The synthesizer estimates the frequency, amplitude, and phase of each harmonic from the spectrum. Relatively high-quality speech in the 4.8-9.6 kb/s range is obtained by modeling the magnitude/phase spectrum with a combination of pole-zero analysis, phase prediction and vector quantization. A window subtraction method ensures proper synthesis of unvoiced speech. The system is robust since it does not depend on pitch estimates or voicing decisions.<<ETX>>

[1]  L. Almeida,et al.  A background for sinusoid based representation of voiced speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  T. Quatieri,et al.  Phase modelling and its application to sinusoidal transform coding , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[4]  R. McAulay,et al.  "Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps" , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  E. Bronson,et al.  Harmonic coding of speech at 4.8 kb/s , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  W. R. Daumer,et al.  A subjective comparison of selected digital codecs for speech , 1978, The Bell System Technical Journal.

[7]  Luís B. Almeida,et al.  Nonstationary spectral modeling of voiced speech , 1983 .

[8]  Isabel Trancoso,et al.  Pole-zero multipulse speech representation using harmonic modelling in the frequency domain , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..