Vector quantization in speech coding

Quantization, the process of approximating continuous-amplitude signals by digital (discrete-amplitude) signals, is an important aspect of data compression or coding, the field concerned with the reduction of the number of bits necessary to transmit or store analog data, subject to a distortion or fidelity criterion. The independent quantization of each signal value or parameter is termed scalar quantization, while the joint quantization of a block of parameters is termed block or vector quantization. This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization. Vector quantization is presented as a process of redundancy removal that makes effective use of four interrelated properties of vector parameters: linear dependency (correlation), nonlinear dependency, shape of the probability density function (pdf), and vector dimensionality itself. In contrast, scalar quantization can utilize effectively only linear dependency and pdf shape. The basic concepts are illustrated by means of simple examples and the theoretical limits of vector quantizer performance are reviewed, based on results from rate-distortion theory. Practical issues relating to quantizer design, implementation, and performance in actual applications are explored. While many of the methods presented are quite general and can be used for the coding of arbitrary signals, this paper focuses primarily on the coding of speech signals and parameters.

[1]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[2]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[3]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[4]  J. Doob Stochastic processes , 1953 .

[5]  Max V. Mathews,et al.  A linear coding for transmitting a set of correlated signals , 1956, IRE Trans. Inf. Theory.

[6]  H. Dudley Phonetic Pattern Recognition Vocoder for Narrow‐Band Speech Transmission , 1958 .

[7]  L. F. Tóth Sur la représentation d'une population infinie par un nombre fini d'éléments , 1959 .

[8]  C. A. Rogers,et al.  An Introduction to the Geometry of Numbers , 1959 .

[9]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[10]  Richard Bellman,et al.  Introduction to Matrix Analysis , 1972 .

[11]  P. Schultheiss,et al.  Block Quantization of Correlated Gaussian Random Variables , 1963 .

[12]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[13]  J. Ziman Principles of the Theory of Solids , 1965 .

[14]  Bennett Fox,et al.  Discrete Optimization Via Marginal Analysis , 1966 .

[15]  V. Algazi,et al.  Useful Approximations to Optimum Quantization , 1966 .

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  David J. Sakrison,et al.  A geometric treatment of the source encoding of a Gaussian random variable , 1968, IEEE Trans. Inf. Theory.

[18]  Herbert Gish,et al.  Asymptotically efficient quantizing , 1968, IEEE Trans. Inf. Theory.

[19]  Frederick Jelinek Tree encoding of memoryless time-discrete sources with a fidelity criterion , 1969, IEEE Trans. Inf. Theory.

[20]  C. P. Smith Perception of vocoder speech processed by pattern matching. , 1969, The Journal of the Acoustical Society of America.

[21]  F. Itakura,et al.  A statistical method for estimation of speech spectral density and formant frequencies , 1970 .

[22]  M. R. Schroeder,et al.  Adaptive predictive coding of speech signals , 1970, Bell Syst. Tech. J..

[23]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[24]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[25]  Fumitada Itakura,et al.  An Audio Response Unit Based on Partial Autocorrelation , 1972, IEEE Trans. Commun..

[26]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[27]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[28]  Andrew J. Viterbi,et al.  Trellis Encoding of memoryless discrete-time sources with a fidelity criterion , 1974, IEEE Trans. Inf. Theory.

[29]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[30]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[31]  John B. Anderson,et al.  Tree encoding of speech , 1975, IEEE Trans. Inf. Theory.

[32]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[33]  M. R. Sambur An efficient linear-prediction vocoder , 1975, The Bell System Technical Journal.

[34]  J. Makhoul,et al.  Quantization properties of transmission parameters in linear predictive systems , 1975 .

[35]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[36]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[37]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[38]  Thomas P. Yunck,et al.  A Technique to Identify Nearest Neighbors , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  George S. Kang,et al.  600 bps Voice digitizer , 1976, ICASSP.

[40]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[41]  Adrian Segall Bit allocation and encoding for vector sources , 1976, IEEE Trans. Inf. Theory.

[42]  B. Gold,et al.  Digital speech networks , 1977, Proceedings of the IEEE.

[43]  P. Noll,et al.  Adaptive transform coding of speech signals , 1977 .

[44]  J.B. O'Neal Waveform quantization and coding , 1977, Proceedings of the IEEE.

[45]  A. Goldberg Predictive coding with delayed decision. , 1977 .

[46]  Robert J. McEliece,et al.  The theory of information and coding : a mathematical framework for communication , 1977 .

[47]  Bishnu S. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1978, ICASSP.

[48]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[49]  P. Noll,et al.  Bounds on Quantizer Performance in the Low Bit-Rate Region , 1978, IEEE Trans. Commun..

[50]  John Makhoul,et al.  Predictive and residual encoding of speech , 1978 .

[51]  Edward McLarnon A method for reducing the transmission rate of a channel vocoder by using frame interpolation , 1978, ICASSP.

[52]  Ronald E. Crochiere,et al.  Frequency domain coding of speech , 1979 .

[53]  G. Longo,et al.  The theory of information and coding: A mathematical framework for communication , 1979, Proceedings of the IEEE.

[54]  John Makhoul,et al.  An adaptive‐transform baseband coder , 1979 .

[55]  Allen Gersho,et al.  Asymptotically optimal block quantization , 1979, IEEE Trans. Inf. Theory.

[56]  John Makhoul,et al.  Adaptive noise spectral shaping and entropy coding in predictive coding of speech , 1979 .

[57]  Paul Mermelstein,et al.  Evaluation of a segmental SNR measure as an indicator of the quality of ADPCM coded speech , 1979 .

[58]  Stephen G. Wilson,et al.  Adaptive Tree Encoding of Speech at 8000 Bits/s with a Frequency-Weighted Error Criterion , 1979, IEEE Trans. Commun..

[59]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[60]  John Makhoul,et al.  Speech-quality optimization of 16 kb/s adaptive predictive coders , 1980, ICASSP.

[61]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[62]  Jesse W. Fussell The Karhunen-Loeve transform applied to the log area ratios of a linear predictive speech coder , 1980, ICASSP.

[63]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[64]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[65]  Jean-Pierre Adoul,et al.  Spectral distance measure applied to the optimum design of DPCM coders with L predictors , 1980, ICASSP.

[66]  B. Atal,et al.  Improved quantizer for adaptive predictive coding of speech signals at low bit rates , 1980, ICASSP.

[67]  Saburo Tazaki,et al.  Asymptotic performance of block quantizers with difference distortion measures , 1980, IEEE Trans. Inf. Theory.

[68]  Robert M. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[69]  T. Barnwell Correlation analysis of subjective and objective measures for speech quality , 1980, ICASSP.

[70]  R. Gray,et al.  Asymptotically Mean Stationary Measures , 1980 .

[71]  John J. O'Donnell A system for very low data rate speech communication , 1981, ICASSP.

[72]  Lalit R. Bahl,et al.  Continuous speech recognition with automatically selected acoustic prototypes obtained by either bootstrapping or clustering , 1981, ICASSP.

[73]  Bernard Gold Experiments with a pattern-matching channel vocoder , 1981, ICASSP.

[74]  Jean-Pierre Adoul,et al.  Medium band speech coding using a dictionary of waveforms , 1981, ICASSP.

[75]  I. Sethi A Fast Algorithm for Recognizing Nearest Neighbors , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[76]  P. Noll,et al.  Multipath Search Coding of Stationary Signals with Applications to Speech , 1982, IEEE Trans. Commun..

[77]  N. J. A. Sloane,et al.  Fast quantizing and decoding and algorithms for lattice quantizers and codes , 1982, IEEE Trans. Inf. Theory.

[78]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[79]  Robert M. Gray,et al.  Multiple local optima in vector quantizers , 1982, IEEE Trans. Inf. Theory.

[80]  Bishnu S. Atal,et al.  Predictive Coding of Speech at Low Bit Rates , 1982, IEEE Trans. Commun..

[81]  N. J. A. Sloane,et al.  Voronoi regions of lattices, second moments of polytopes, and quantization , 1982, IEEE Trans. Inf. Theory.

[82]  Allen Gersho,et al.  On the structure of vector quantizers , 1982, IEEE Trans. Inf. Theory.

[83]  Robert M. Gray,et al.  Vector Quantizers and Predictive Quantizers for Gauss-Markov Sources , 1982, IEEE Trans. Commun..

[84]  S. Roucos,et al.  Segment quantization for very-low-rate speech coding , 1982, ICASSP.

[85]  John Makhoul,et al.  Variable Frame Rate Transmission: A Review of Methodology and Application to Narrow-Band LPC Speech Coding , 1982, IEEE Trans. Commun..

[86]  Robert M. Gray,et al.  The Design of Trellis Waveform Coders , 1982, IEEE Trans. Commun..

[87]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[88]  Biing-Hwang Juang,et al.  An 800 bit/s vector quantization LPC vocoder , 1982 .

[89]  Richard M. Schwartz,et al.  A variable-order Markov chain for coding of speech spectra , 1982, ICASSP.

[90]  Toby Berger Minimum entropy quantizers and permutation codes , 1982, IEEE Trans. Inf. Theory.

[91]  Paul L. Zador,et al.  Asymptotic quantization error of continuous signals and the quantization dimension , 1982, IEEE Trans. Inf. Theory.

[92]  Biing-Hwang Juang,et al.  Multiple stage vector quantization for speech coding , 1982, ICASSP.

[93]  Richard M. Schwartz,et al.  A comparison of methods for 300-400 b/s vocoders , 1983, ICASSP.

[94]  V. Cuperman,et al.  Vector quantization: A pattern-matching technique for speech coding , 1983, IEEE Communications Magazine.

[95]  D. Wong,et al.  Very low data rate speech compression with LPC vector and matrix quantization , 1983, ICASSP.

[96]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[97]  V. Viswanathan,et al.  Objective speech quality evaluation of mediumband and narrowband real-time speech coders , 1983, ICASSP.

[98]  John E. Shore,et al.  A generalization of isolated word recognition using vector quantization , 1983, ICASSP.

[99]  Roberto Billi,et al.  Experimental comparison among data compression techniques in isolated word recognition , 1983, ICASSP.

[100]  Richard M. Schwartz,et al.  A segment vocoder at 150 b/s , 1983, ICASSP.

[101]  Douglas B. Paul,et al.  An 800 bps adaptive vector quantization vocoder using a perceptual distance measure , 1983, ICASSP.

[102]  Douglas B. Paul The Lincoln low-rate vocoder: A 1200/2400 bps LPC-10 voice terminal , 1984 .

[103]  David G. Messerschmitt,et al.  Predictive vector quantization , 1984, ICASSP.

[104]  Daniele Sereno,et al.  9.6 kbit/s Piecewise LPC residual excited coder using multiple-stage vector quantization , 1984, ICASSP.

[105]  Yair Shoham,et al.  Pitch Synchronous Transform Coding of Speech at 9.6Kb/s Based On Vector Quantization , 1984, ICC.

[106]  Allen Gersho,et al.  Fast search algorithms for vector quantization and pattern matching , 1984, ICASSP.

[107]  Hüseyin Abut,et al.  Vector quantizers for subband coded waveforms , 1984, ICASSP.

[108]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[109]  R. Gray,et al.  Product code vector quantizers for waveform and voice coding , 1984 .

[110]  Tor A. Ramstad,et al.  Fully vector-quantized subband coding with adaptive codebook allocation , 1984, ICASSP.

[111]  Nariman Farvardin,et al.  Optimum quantizer performance for a class of non-Gaussian memoryless sources , 1984, IEEE Trans. Inf. Theory.

[112]  Stephen E. Levinson,et al.  A vector quantizer incorporating both LPC shape and energy , 1984, ICASSP.

[113]  Richard M. Schwartz,et al.  Improved hidden Markov modeling of phonemes for continuous speech recognition , 1984, ICASSP.

[114]  Thomas R. Fischer,et al.  Vector Quantizer Design for Memoryless Gaussian, Gamma, and Laplacian Sources , 1984, IEEE Trans. Commun..

[115]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[116]  Yair Shoham,et al.  Efficient codebook allocation for an arbitrary set of vector quantizers , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[117]  E. Dubois,et al.  The sampling and reconstruction of time-varying imagery with application in video systems , 1985, Proceedings of the IEEE.

[118]  David G. Messerschmitt,et al.  Embedded coding of speech: A vector quantization approach , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[119]  D. Wolf,et al.  Speech and speaker independent codebook design in VQ coding schemes , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[120]  S.E. Levinson,et al.  Structural methods in automatic speech recognition , 1985, Proceedings of the IEEE.

[121]  Thomas R. Fischer,et al.  Contour vector quantization and waveform coding , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[122]  Allen Gersho,et al.  Vector Predictive Coding of Speech at 16 kbits/s , 1985, IEEE Trans. Commun..

[123]  Robert M. Gray,et al.  An Algorithm for the Design of Labeled-Transition Finite-State Vector Quantizers , 1985, IEEE Trans. Commun..

[124]  Robert M. Gray,et al.  Finite-state vector quantization for waveform coding , 1985, IEEE Trans. Inf. Theory.

[125]  N. J. A. Sloane,et al.  A lower bound on the average error of vector quantizers , 1985, IEEE Trans. Inf. Theory.

[126]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.