Distortion measures for speech processing

Several properties, interrelations, and interpretations are developed for various speech spectral distortion measures. The principle results are 1) the development of notions of relative strength and equivalence of the various distortion measures both in a mathematical sense corresponding to subjective equivalence and in a coding sense when used in minimum distortion or nearest neighbor speech processing systems; 2) the demonstration that the Itakura-Saito and related distortion measures possess a property similar to the triangle inequality when used in nearest neighbor systems such as quantization and cluster analysis; and 3) that the Itakura-Saito and normalized model distortion measures yield efficient computation algorithms for generalized centroids or minimum distortion points of groups or clusters of speech frames, an important computation in both classical cluster analysis techniques and in algorithms for optimal quantizer design. We also argue that the Itakura-Saito and related distortions are well-suited computationally, mathematically, and intuitively for such applications.

[1]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[2]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[3]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[6]  F. Itakura,et al.  A statistical method for estimation of speech spectral density and formant frequencies , 1970 .

[7]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[8]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[9]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[10]  A. Gray,et al.  On autocorrelation equations as applied to speech analysis , 1973 .

[11]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[12]  J. Makhoul,et al.  Quantization properties of transmission parameters in linear predictive systems , 1975 .

[13]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[14]  R. Gray,et al.  A Generalization of Ornstein's $\bar d$ Distance with Applications to Information Theory , 1975 .

[15]  R. Ash,et al.  Real analysis and probability , 1975 .

[16]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[17]  R. Gray,et al.  Comparison of optimal quantizations of speech reflection coefficients , 1977 .

[18]  P. de Souza,et al.  Statistical tests and distance measures for LPC coefficients , 1977 .

[19]  R. Gray,et al.  Spectral Distortion Measures for Speech Compression. , 1978 .

[20]  Robert M. Gray,et al.  Source Coding and Speech Compression , 1978 .

[21]  Robert M. Gray,et al.  A two-step speech compression system with vector quantizing , 1979, ICASSP.

[22]  Aaron E. Rosenberg,et al.  Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition , 1979 .

[23]  W Gersch,et al.  Automatic classification of electroencephalograms: Kullback-Leibler nearest neighbor rules. , 1979, Science.

[24]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[25]  Robert M. Gray,et al.  Locally Optimal Block Quantizer Design , 1980, Inf. Control..

[26]  Robert M. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[27]  Robert M. Gray,et al.  Universal tree encoding for speech , 1981, IEEE Trans. Inf. Theory.

[28]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.