Modified approach to spectral estimation for sinusoidal transform coding

In the continuing development of mobile telephony, demand for speech coders at ever lower bit-rates is resulting in increasing research interest in sinusoidal modeling as used by sinusoidal transform coding (STC) and other techniques. With STC, the parameters for voiced speech are the amplitudes, frequencies and phases of sinusoids derived from a high resolution short-term Fourier transform of speech segments performed at intervals of 20 to 30 ms. These parameters are traditionally derived from the short-term speech spectral envelope by peak-picking and cubic spline interpolation and are efficiently represented by the coefficients of an all-pole digital filter derived by a form of linear prediction. The accuracy of spectral estimation becomes increasingly important, as the coding method is adapted to lower and lower bit-rates. In this paper, discrete all-pole (DAP) modeling is applied to STC to improve the accuracy of the short-term spectral envelope for voiced speech. While providing more accurate spectra for voiced speech conforming well to an all-pole model, the DAP method is known to produce occasional over-resonance resulting in tonal artifacts. Investigation of this distortion has led to the development of a modified approach to DAP modeling where the unvoiced region of a speech spectrum is represented by an averaged amplitude spectrum. Objective distortion measures and informal listening tests have demonstrated that the application of this modified form of DAP to STC can provide perceivable improvements in speech at bit-rates of 4 kbit/s and below.