Relationship between LP-residual spectral distances and phonetic judgments

The relationship between LP‐residual spectral distances [F. Itakura, IEEE Trans. Acoust. Speech Signal Process. ASSP‐23, 67–72(1975)] and phonetic judgments was examined in two studies. One analysis compared published phonetic similarity results [D. Klatt, Proc. ICASSP, 1278–1281 (1982)] with Itakura distances between steady‐state synthetic reference vowels (/ae/ and /a/) and a set of acoustic variants of each vowel (approximating Klatt's stimulus sets). Despite the documented utility of the Itakura measure for speech recognition, correlations between the phonetic similarity measures and Itakura distances were very low (0.20 and 0.02 for /ae/ and /a/, respectively). Significant differences in rank across the two measures were observed for acoustic variants incorporating spectral tilt, low‐pass filtering and changes in F3. A second experiment measured Itakura distances and phonemic identification judgments for 21 acoustic variants of each often English reference vowels. For six reference vowels, at least one variant was identified as phonemically distinct from its reference: these phonemically distinct variants yielded smaller Itakura distances than 48% of those variants judged not phonemically distinct from the reference. These results quantify and corroborate the well‐known lack of optimality of the Itakura distance measure for speech recognition and may contribute to the design of more appropriate distance metrics.