Information-theoretic distortion measures for speech recognition: theoretical considerations and experimental results

It is shown that there is a general framework, based on information theory, underlying many currently popular distortion measures used for speech recognition. Within this framework, three general categories of information-theoretic distortion measures are introduced: the generalized Kolmogorov variational distance, the f-divergence, and the Chernoff distance. There are two major results of this investigation. First, it is found that most of the important distortion measures used by workers in speech recognition fall out as a special case of one or another of the classes of probability-distribution dissimilarity measures. Second, the information-theoretic perspective adopted makes it possible to discover new distortion measures which may display superior speech recognition performance; one measure, the clamped log (cos beta ) distance, has been investigated experimentally, with promising results.<<ETX>>

[1]  C. H. Chen,et al.  On information and distance measures, error bounds, and feature selection , 1976, Information Sciences.

[2]  Fumitada Itakura,et al.  Distance measure for speech recognition based on the smoothed group delay spectrum , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[4]  Dennis H. Klatt,et al.  Comparative study of several distortion measures for speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[6]  N. Dixon,et al.  A comparison of several speech-spectra classification methods , 1976 .

[7]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[8]  B. Hanson,et al.  Spectral slope based distortion measures for all-pole models of speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Biing-Hwang Juang,et al.  A family of distortion measures base upon projection operation for robust speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yoh'ichi Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[11]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..