Phonetic Segmentation for Low Rate Speech Coding

Efforts to bridge the gap between waveform coders and vocoders has led to a new class of hybrid speech coders. These coders perform analysis-by-synthesis encoding of an excitation signal and reconstruct speech from the coded excitation signal and a quantized time-varying filter model of speech production. Most notable of these coders are those which use vector quantization to code the excitation signal as a sequence of vectors. The coding technique is called Code Excited Linear Prediction (CELP) [1], or Vector Excitation Coding (VXC) [2]. VXC coders result in coded speech with a waveform approximating the original and are able to achieve a satisfactory, natural-sounding quality at bit rates as low as 4.8 kb/s. When the bitrate is reduced below 4.8 kb/s, the quality of VXC coders degrades rapidly and becomes inferior to the synthetic quality of an LPC vocoder operating at 2.4 kb/s. There remains then the challenging problem to find an algorithm that at 2.4 kb/s (or even at 3.6 kb/s) will achieve the quality that VXC offers at 4.8 kb/s

[1]  Allen Gersho,et al.  Encoding of LPC spectral parameters using switched-adaptive interframe vector prediction (speech coding) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  Allen Gersho,et al.  Phonetically-based vector excitation coding of speech at 3.6 kbps , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Robert M. Gray,et al.  Multimode coding: application to CELP , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  K. Ozawa,et al.  2.4 kbps pitch prediction multi-pulse speech coding , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[5]  B. Atal,et al.  Strategies for improving the performance of CELP coders at low bit rates (speech analysis) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  T. M. Liu,et al.  Phonetically-based LPC vector quantization of high quality speech , 1989, EUROSPEECH.

[7]  Richard M. Schwartz,et al.  A segment vocoder at 150 b/s , 1983, ICASSP.

[8]  Bishnu S. Atal,et al.  Improving performance of multi-pulse LPC coders at low bit rates , 1984, ICASSP.

[9]  M. Copperi Rule-based speech analysis and application of CELP coding , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Allen Gersho,et al.  Vector excitation coding with dynamic bit allocation , 1988, IEEE Global Telecommunications Conference and Exhibition. Communications for the Information Age.

[12]  Allen Gersho,et al.  Complexity reduction methods for vector excitation coding , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Allen Gersho,et al.  Efficient Encoding of the Long-Term Predictor in Vector Excitation Coders , 1991 .

[14]  Allen Gersho,et al.  Real-time vector excitation coding of speech at 4800 bps , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.