Distance measures for speech processing

The properties and interrelationships among four measures of distance in speech processing are theoretically and experimentally discussed. The root mean square (rms) log spectral distance, cepstral distance, likelihood ratio (minimum residual principle or delta coding (DELCO) algorithm), and a cosh measure (based upon two nonsymmetrical likelihood ratios) are considered. It is shown that the cepstral measure bounds the rms log spectral measure from below, while the cosh measure bounds it from above. A simple nonlinear transformation of the likelihood ratio is shown to be highly correlated with the rms log spectral measure over expected ranges. Relationships between distance measure values and perception are also considered. The likelihood ratio, cepstral measure, and cosh measure are easily evaluated recursively from linear prediction filter coefficients, and each has a meaningful and interrelated frequency domain interpretation. Fortran programs are presented for computing the recursively evaluated distance measures.