Prediction of perceived phonetic distance from short‐term spectra—a first step

Some models of speech perception (and many speech recognition devices) posit a process whereby short‐term speech spectra are compared with a set of reference templates. These schemes will succeed only to the extent that metrics can be found that are (1) sensitive to phonetically relevant spectral differences such as those caused by formant frequency changes, and (2) relatively insensitive to phonetically irrelevant spectral differences associated with a change in speaker identity or recording conditions. Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance (e.g., spectral tilt, relative formant amplitudes, high‐pass and low‐pass filtering). These data can be used to evaluate a spectral distance metric. For example, distance calculations based on the sum‐of‐squares of differences in critical‐band filter bank...