This paper introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one nonharmonically related components of the voiced segment. This approach eliminates the need for voicing dependent cutoff frequency that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce frequency bin vector quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching contribute to an improvement of speech quality in transition frames. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. To demonstrate the viability and the advantages of the new models studied, we designed a 4 kbps matching pursuits sinusoidal speech coder. Subjective results indicate that the proposed coder at 4 kbps has quality exceeding the 6.3 kbps G.723.1 coder.
[1]
David Talkin,et al.
A Robust Algorithm for Pitch Tracking ( RAPT )
,
2005
.
[2]
Stéphane Mallat,et al.
Matching pursuits with time-frequency dictionaries
,
1993,
IEEE Trans. Signal Process..
[3]
Allen Gersho,et al.
Combined harmonic and waveform coding of speech at low bit rates
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[4]
Oded Ghitza,et al.
Auditory models and human performance in tasks related to speech coding and speech recognition
,
1994,
IEEE Trans. Speech Audio Process..
[5]
S. Hayashi,et al.
Design and description of CS-ACELP: a toll quality 8 kb/s speech coder
,
1998,
IEEE Trans. Speech Audio Process..
[6]
Allen Gersho,et al.
Speech coding with an analysis-by-synthesis sinusoidal model
,
2000,
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[7]
J. Friedman,et al.
Projection Pursuit Regression
,
1981
.
[8]
Thomas F. Quatieri,et al.
Speech analysis/Synthesis based on a sinusoidal representation
,
1986,
IEEE Trans. Acoust. Speech Signal Process..
[9]
Vladimir Cuperman,et al.
Harmonic coding of speech at low bit rates
,
1995
.
[10]
Mark J. T. Smith,et al.
Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model
,
1997,
IEEE Trans. Speech Audio Process..
[11]
T.H. Crystal,et al.
Linear prediction of speech
,
1977,
Proceedings of the IEEE.