Variable dimension vector quantization of linear predictive coefficients of speech

We introduce a method for locally optimal variable-to-variable length source coding with distortion, and apply it to coding the linear predictive coefficients of speech. The method is similar to entropy-constrained vector quantization, but it uses a dynamic programming algorithm to encode. The method automatically discovers variable-length source structure, in this case the acoustic-phonetic structure of speech. Using this structure, it is possible to compress the linear predictive coefficients of speech to one-third the rate of entropy-constrained vector quantization of speech, with no increase in spectral distortion. Auditory tests reveal that using this method, the spectral component of speech can be coded naturally and intelligibly to as low as 50 bits per second.<<ETX>>

[1]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[2]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[3]  Richard M. Schwartz,et al.  A segment vocoder at 150 b/s , 1983, ICASSP.

[4]  Masaaki Honda,et al.  LPC speech coding based on variable-length segment quantization , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  Gary J. Sullivan,et al.  Efficient quadtree coding of images and video , 1994, IEEE Trans. Image Process..

[6]  P. Peterson,et al.  Segment vocoder based on reconstruction with natural segments , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Philip A. Chou,et al.  Optimal pruning with applications to tree-structured source coding and modeling , 1989, IEEE Trans. Inf. Theory.

[8]  Yair Shoham,et al.  Hierarchical vector quantization of speech with dynamic codebook allocation , 1984, ICASSP.

[9]  Frank Fallside,et al.  Frame compression in hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  Philip A. Chou,et al.  Conditional entropy-constrained vector quantization of linear predictive coefficients , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Chin-Hui Lee,et al.  Word recognition using whole word and subword models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  Allen Gersho,et al.  Phonetic Segmentation for Low Rate Speech Coding , 1991 .

[14]  Allen Gersho,et al.  Variable block-size image coding , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Philip A. Chou,et al.  Entropy-constrained vector quantization , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  S. Roucos,et al.  Segment quantization for very-low-rate speech coding , 1982, ICASSP.