This study provides a quantitative measure of the accuracy of the auditory periphery in representing prespecified time-frequency regions of initial and final diphones of spoken CVCs. The database comprised word pairs that span the speech space along Jakobson et al.'s binary phonemic features [Tech. Rep. No. 13, Acoustic Laboratory, MIT, Cambridge, MA (1952)]. The time-frequency domain was divided into "tiles" by splitting the frequency range into three bands ([0,1000], [1000,2500], [2500,4000]Hz), and by marking the phonemic time landmarks of the CVC utterance. Fourteen modified versions of this database were generated by introducing well-defined distortions into the time-frequency tiles (or combination of tiles). The performance of eight listeners was measured for each of these versions by using a one-interval two-alternative forced-choice paradigm, to minimize the role of cognition. The results demonstrate that in the first and the second frequency bands, the diphone information is far more important than the consonant information or the vowel information alone. As for the third band, most of the information of the diphone is contained in the consonantal time interval. These observations are common to both the initial and the final consonants of spoken CVCs. The study also provides a direct mapping between Jakobson et al.'s features and particular regions in the time-frequency domain. Voicing and nasality are strongly correlated with the diphone information in the first frequency band, graveness and compactness with the diphone information in the second frequency band, and sibilation with the consonantal time interval in the third frequency band.(ABSTRACT TRUNCATED AT 250 WORDS)
[1]
A. Liberman,et al.
Some Experiments on the Perception of Synthetic Speech Sounds
,
1952
.
[2]
A. Liberman,et al.
Acoustic Loci and Transitional Cues for Consonants
,
1954
.
[3]
J. Douglas Carroll,et al.
Chapter 13 – APPLICATIONS OF INDIVIDUAL DIFFERENCES SCALING TO STUDIES OF HUMAN PERCEPTION AND JUDGMENT
,
1974
.
[4]
B. Delgutte,et al.
Speech coding in the auditory nerve: I. Vowel-like sounds.
,
1984,
The Journal of the Acoustical Society of America.
[5]
D. Klatt.
Review of selected models of speech perception
,
1989
.
[6]
G. A. Miller,et al.
An Analysis of Perceptual Confusions Among Some English Consonants
,
1955
.
[7]
J. Lackner,et al.
The Psychological Representation of Speech Sounds
,
1975,
The Quarterly journal of experimental psychology.
[8]
G. E. Peterson,et al.
Control Methods Used in a Study of the Vowels
,
1951
.
[9]
M. Sachs,et al.
Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate.
,
1979,
The Journal of the Acoustical Society of America.
[10]
M. Halle,et al.
Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates
,
1961
.
[11]
C.H. Coker,et al.
A model of articulatory dynamics and control
,
1976,
Proceedings of the IEEE.