Speech perception as pattern recognition.

This work provides theoretical and empirical arguments in favor of an approach to phonetics that is called double-weak. It is so called because it assumes relatively weak constraints both on the articulatory gestures and on the auditory patterns that map phonological elements. This approach views speech production and perception as distinct but cooperative systems. Like the motor theory of speech perception, double-weak theory accepts that phonological units are modified by context in ways that are important to perception. It further agrees that many aspects of such context dependency have their origin in natural articulatory processes. However, double-weak theory sides with proponents of auditory theories of phonetics by accepting that the real-time objects of perception are well-defined auditory patterns. Because speakers find ways to obey “orderly output conditions” (Sussman et al., 1995), listeners are able to successfully decode speech using relatively simple pattern-recognition mechanisms. It is suggested that this situation has arisen through a stylization of gestural patterns to accommodate real-time limits of the perceptual system. Results from a new perceptual experiment, involving a four-dimensional stimulus continuum and a 10-category /hVC/ response set, are shown to be largely compatible with this framework.

[1]  Michael I. Jordan,et al.  Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: a pilot "motor equivalence" study. , 1993, The Journal of the Acoustical Society of America.

[2]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[3]  P Ladefoged,et al.  Individual differences in vowel production. , 1993, The Journal of the Acoustical Society of America.

[4]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .

[5]  H. Sussman,et al.  A cross-linguistic investigation of locus equations as a phonetic descriptor for place of articulation. , 1993, The Journal of the Acoustical Society of America.

[6]  V. Mann,et al.  Native language factors affecting use of vocalic cues to final consonant voicing in English. , 1992, The Journal of the Acoustical Society of America.

[7]  P. Milenkovic,et al.  Statistical analysis of word-initial voiceless obstruents: preliminary data. , 1988, The Journal of the Acoustical Society of America.

[8]  D. Massaro,et al.  Evaluation and integration of acoustic features in speech perception. , 1980, The Journal of the Acoustical Society of America.

[9]  R. Fox,et al.  Auditory and categorical effects on cross-language vowel perception. , 1994, The Journal of the Acoustical Society of America.

[10]  R. S. McGowan,et al.  Introduction to papers on speech recognition and perception from an articulatory point of view , 1996 .

[11]  R. N. Ohde,et al.  Spectral and duration properties of front vowels as cues to final stop-consonant voicing. , 1990, The Journal of the Acoustical Society of America.

[12]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[13]  S. Blumstein,et al.  A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: evidence from a cross-language study. , 1981, The Journal of the Acoustical Society of America.

[14]  D. Massaro,et al.  Phonological context in speech perception , 1983, Perception & psychophysics.

[15]  Terrance M. Nearey,et al.  Modeling the role of inherent spectral change in vowel identification , 1986 .

[16]  C S Watson,et al.  Auditory temporal acuity in relation to category boundaries; speech and nonspeech stimuli. , 1988, The Journal of the Acoustical Society of America.

[17]  W. V. Summers Effects of stress and final-consonant voicing on vowel production: articulatory and acoustic analyses. , 1987, The Journal of the Acoustical Society of America.

[18]  C S Watson,et al.  Central factors in the discrimination and identification of complex sounds. , 1985, The Journal of the Acoustical Society of America.

[19]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[20]  W. V. Summers,et al.  F1 structure provides information for final-consonant voicing. , 1988, The Journal of the Acoustical Society of America.

[21]  S. Zahorian,et al.  Dynamic spectral shape features as acoustic correlates for initial stop consonants , 1991 .

[22]  P. Mermelstein,et al.  On the relationship between vowel and consonant identification when cued by the same acoustic information , 1978, Perception & psychophysics.

[23]  S. Blumstein,et al.  Perceptual invariance and onset spectra for stop consonants in different vowel environments , 1976 .

[24]  B. Repp Phonetic trading relations and context effects : new experimental evidence for a speech mode of perception , 1982 .

[25]  H M Sussman,et al.  Locus equations derived from compensatory articulation. , 1995, The Journal of the Acoustical Society of America.

[26]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[27]  C S Watson,et al.  Some remarks on Pastore (1988). , 1988, The Journal of the Acoustical Society of America.

[28]  Douglas H. Whalen,et al.  Vowel and consonant judgments are not independent when cued by the same information , 1989 .

[29]  H. Sussman,et al.  An investigation of locus equations as a source of relational invariance for stop place categorization , 1991 .