Frequency-domain spectral envelope estimation for low rate coding of speech

Estimation of the spectral envelope in the frequency domain allows to avoid some problems of linear prediction (LP) algorithms for voiced speech. We present a low complexity method of spectral envelope estimation from harmonics for low rate coding. The method consists in computing the harmonic amplitude spectrum using pitch-synchronous DFT with length depending on voicing, modifying this spectrum outside the telephone bandwidth to simplify modeling of the useful bandwidth and interpolating it by a frequency-domain low-pass filter. An all-pole model is then fitted to this modified smoothed version of the harmonic spectrum. The method was implemented on the harmonic-stochastic excitation (HSX) vocoder and the performance was compared with the LP algorithm similar to that used in the G.729 speech coding standard. A-B comparative tests show an important increase in perceptual quality.

[1]  Kuldip K. Paliwal,et al.  An Introduction to Speech Coding , 1995 .

[2]  Bishnu S. Atal,et al.  Optimizing LPC filter parameters for multi-pulse excitation , 1983, ICASSP.

[3]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[4]  Ronald W. Schafer,et al.  A 4.8 k bps homomorphic vocoder using analysis-by-synthesis excitation analysis , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  J.-P. Adoul,et al.  Harmonic-stochastic excitation (HSX) speech coding below 4 kbit/s , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  D. Paul The spectral envelope estimation vocoder , 1981 .

[7]  Riichiro Mizoguchi,et al.  Analysis of speech signals of short pitch period by a sample-selective linear prediction , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Peter Kabal,et al.  Smoothing the evolution of the spectral parameters in linear prediction of speech using target matching , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[10]  J. Makhoul,et al.  Quantization properties of transmission parameters in linear predictive systems , 1975 .

[11]  Bishnu S. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1978, ICASSP.

[12]  Bhaskar D. Rao,et al.  Minimum variance distortionless response (MVDR) modeling of voiced speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Allen Gersho,et al.  Variable dimension spectral coding of speech at 2400 bps and below with phonetic classification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.