Processing of spoken CVCs in the auditory periphery. I. Psychophysics.

This study provides a quantitative measure of the accuracy of the auditory periphery in representing prespecified time-frequency regions of initial and final diphones of spoken CVCs. The database comprised word pairs that span the speech space along Jakobson et al.'s binary phonemic features [Tech. Rep. No. 13, Acoustic Laboratory, MIT, Cambridge, MA (1952)]. The time-frequency domain was divided into "tiles" by splitting the frequency range into three bands ([0,1000], [1000,2500], [2500,4000]Hz), and by marking the phonemic time landmarks of the CVC utterance. Fourteen modified versions of this database were generated by introducing well-defined distortions into the time-frequency tiles (or combination of tiles). The performance of eight listeners was measured for each of these versions by using a one-interval two-alternative forced-choice paradigm, to minimize the role of cognition. The results demonstrate that in the first and the second frequency bands, the diphone information is far more important than the consonant information or the vowel information alone. As for the third band, most of the information of the diphone is contained in the consonantal time interval. These observations are common to both the initial and the final consonants of spoken CVCs. The study also provides a direct mapping between Jakobson et al.'s features and particular regions in the time-frequency domain. Voicing and nasality are strongly correlated with the diphone information in the first frequency band, graveness and compactness with the diphone information in the second frequency band, and sibilation with the consonantal time interval in the third frequency band.(ABSTRACT TRUNCATED AT 250 WORDS)