A rational account of perceptual compensation for coarticulation

A rational account of perceptual compensation for coarticulation Morgan Sonderegger (morgan@cs.uchicago.edu) Department of Computer Science, University of Chicago, Chicago, IL 60637 USA Alan Yu (aclyu@uchicago.edu) Phonology Laboratory, Department of Linguistics, University of Chicago, Chicago, IL 60637 USA Abstract natural coarticulated speech. As another example, the percep- tion of a fundamental frequency (f 0 ) contour can change as a function of vowel height (Hombert, 1978; Silverman, 1987) or consonant voicing (Pardo & Fowler, 1997): /i/ is perceived as lower in pitch relative to an /a/ with the same f 0 , presum- ably because high vowels typically have higher f 0 than low vowels. Listeners’ language-specific experience crucially affects the degree of perceptual compensation. In a study replicated in part below, Beddor, Harnsberger, & Lindemann (2002) found that English and Shona listeners compensate for the coarticulatory effects of V 2 on V 1 in CV 1 CV 2 sequences. That is, listeners identified a continuum of synthesized vow- els between /a/ and /e/ more often as /a/ when the following vowel was /i/ than when the following vowel was /a/. Impor- tantly, they observed that Shona listeners compensate more for the vowel contexts that triggered larger acoustic influences in speech production. Compensatory responses can affect lis- teners’ rating judgments as well. English listeners are less accurate in judging vowel nasality in nasal than in non-nasal contexts, with nasal vowels in nasal contexts the most diffi- cult (Beddor & Krakow, 1999; Kawasaki, 1986). Explanations of PC effects have been advanced from sev- eral theoretical perspectives. Some emphasize the lexical and phonemic content of the context in determining the identifica- tion of the target sound (Elman & McClelland, 1988; Samuel & Pitt, 2003). Gestural theorists, who assume that listeners parse the acoustics in terms of its articulatory sources, argue that listeners attribute the acoustic properties of a target sound to the coarticulatory context rather than to the target (Fowler, 1996, 2006). Auditorists attribute context-induced shifts in category boundaries to general auditory processes such as fre- quency contrast or spectral contrast (Diehl & Kluender, 1989; Kingston, 1992; Kingston & Diehl, 1995; Lotto & Kluender, 1998). Such auditory explanations are unavailable for com- pensation effects such as vowel-dependent pitch height com- pensation (Fowler, 2006; Lotto & Holt, 2006). Motivated by such cases, Lotto & Holt (2006) suggest that the spectral con- trast explanation be supplemented with a “general learning” mechanism for category formation from correlations between stimulus parameters. The generality of PC effects is accentuated by evidence for contextual compensation with speech and non-speech sounds in human and non-humans (Holt, Lotto, & Kluender, 2000; Lotto, 2004). For example, when /da/–/ga/ syllables are pre- ceded by tone glides matching in frequency to the third for- mant (F 3 ) transition of /al/ or /ar/, listeners’ syllable identi- A model is presented that explains perceptual compensation for context as a consequence of listeners optimally categoriz- ing speech sounds given contextual variation. In using Bayes’ rule to pick the most likely category, listeners’ perception of speech sounds, which is biased toward the means of phonetic categories (Feldman & Griffiths, 2007; Feldman, Griffiths, & Morgan, 2009), is conditioned by contextual variation. The effect on the resulting identification curves of varying cate- gory frequencies and variances is discussed. A simulation case study of compensation for vowel-to-vowel coarticulation shows the predictions of the model closely correspond to hu- man perceptual data. Keywords: Speech perception; perceptual compensation; ra- tional analysis. Introduction A major challenge for models of speech perception is explain- ing the effect of context on phonemic identification. Depend- ing on their acoustic, phonological, semantic, syntactic, and even socio-indexical contexts, identical acoustic signals can be labeled differently and different acoustic signals can be la- beled identically. One of the most investigated types of con- textual effects stems from phonemes’ phonetic environments. Because of coarticulation, a phoneme’s phonetic realization is heavily context-dependent. To understand speech, the lis- tener must take into account context-induced coarticulatory effects to recover the intended message. The term perceptual compensation (PC) has often been used to characterize this type of context-induced adjustment in speech perception. For example, the identification of an ambiguous target syllable as /da/ or /ga/ is shifted by preceding /ar/ or /al/ contexts (Mann, 1980): the same /Ca/ token is less likely to be heard as /ga/ in /arCa/ context than in /alCa/ context. This effect has been ar- gued to result from perceptual reduction of the coarticulatory fronting effects of /l/ on a following velar consonant: listen- ers are compensating for the effect of /l/ on /g/. This paper proposes a simple model in which PC effects emerge as an optimal solution to the problem of categorization in the pres- ence of context-induced variation. In this model, listeners behave as if they are compensating because what is optimal differs by context. PC effects have been observed in many phonetic settings. The fricative /S/ has lower noise frequencies than /s/, and lip rounding lowers the resonant frequencies of nearby segments. Synthetic fricative noises ranging from /S/ to /s/ are more of- ten identified by English listeners as /s/ when followed by /u/ than by /a/ (Mann & Repp 1980; see also Mitterer 2006), pre- sumably because listeners take into account the lowering ef- fect of lip rounding from /u/ on the noise frequencies of /s/ in

[1]  Naomi H. Feldman,et al.  The influence of categories on perception: explaining the perceptual magnet effect as optimal statistical inference. , 2009, Psychological review.

[2]  James L. McClelland,et al.  Cognitive penetration of the mechanisms of perception: Compensation for coarticulation of lexically restored phonemes , 1988 .

[3]  Juliette Blevins,et al.  The origins of consonant-vowel metathesis , 1998 .

[4]  Jennifer S. Pardo,et al.  Perceiving the causes of coarticulatory acoustic variation: Consonant voicing and vowel pitch , 1997, Perception & psychophysics.

[5]  Juliette Blevins,et al.  Evolutionary Phonology: Beyond phonology , 2004 .

[6]  V. Mann,et al.  Influence of vocalic context on perception of the [∫]-[s] distinction , 1978 .

[7]  C. Fowler,et al.  A critical examination of the spectral contrast account of compensation for coarticulation , 2009, Psychonomic bulletin & review.

[8]  C. Fowler Compensation for coarticulation reflects gesture perception, not spectral contrast , 2006, Perception & psychophysics.

[9]  Holger Mitterer,et al.  On the causes of compensation for coarticulation: Evidence for phonological mediation , 2006, Perception & psychophysics.

[10]  Juliette Blevins,et al.  The evolution of metathesis , 2004 .

[11]  M. Pitt,et al.  Lexical activation (and other factors) can mediate compensation for coarticulation , 2003 .

[12]  Dennis Norris,et al.  The Bayesian reader: explaining word recognition as an optimal Bayesian decision process. , 2006, Psychological review.

[13]  Carol A. Fowler,et al.  Young infants’ perception of liquid coarticulatory influences on following stop consonants , 1990, Perception & psychophysics.

[14]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[15]  R. Diehl,et al.  Phonology and Phonetic Evidence: Intermediate properties in the perception of distinctive feature values , 1995 .

[16]  Robert Kirchner,et al.  Phonetically Based Phonology , 2008 .

[17]  T. Jaeger,et al.  Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. , 2008, Journal of memory and language.

[18]  R. Harald Baayen,et al.  Analyzing linguistic data: a practical introduction to statistics using R, 1st Edition , 2008 .

[19]  R. Jacobs,et al.  Perception of speech reflects optimal use of probabilistic speech cues , 2008, Cognition.

[20]  Alan C. L. Yu,et al.  Explaining final obstruent voicing in Lezgian: Phonetics and history , 2004 .

[21]  M. Landy,et al.  Optimal Compensation for Changes in Task-Relevant Movement Variability , 2005, The Journal of Neuroscience.

[22]  A. Lotto,et al.  Neighboring spectral content influences vowel identification. , 2000, The Journal of the Acoustical Society of America.

[23]  C. Laeufer,et al.  Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV , 1995 .

[24]  Naomi H. Feldman,et al.  A Rational Account of the Perceptual Magnet Effect , 2007 .

[25]  D. Norris,et al.  Shortlist B: a Bayesian model of continuous speech recognition. , 2008, Psychological review.

[26]  V. Fromkin,et al.  Tone : a linguistic survey , 1980 .

[27]  V. Mann Influence of preceding liquid on stop-consonant perception. , 1980, Perception & psychophysics.

[28]  A. Lotto Perceptual Compensation for Coarticulation as a General Auditory Process , 2004 .

[29]  Elisabeth Dévière,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2009 .

[30]  Darya Kavitskaya,et al.  Compensatory Lengthening: Phonetics, Phonology, Diachrony , 2002 .

[31]  R. Diehl,et al.  On the Objects of Speech Perception , 1989 .

[32]  A. Lotto,et al.  General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification , 1998, Perception & psychophysics.

[33]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[34]  K. Holyoak,et al.  Induction of category distributions: a framework for classification learning. , 1984, Journal of experimental psychology. Learning, memory, and cognition.

[35]  A. Lotto,et al.  Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica). , 1997, The Journal of the Acoustical Society of America.

[36]  Jean-Marie Humbert,et al.  Consonant Types, Vowel Quality, and Tone , 1978 .

[37]  Virginia A. Mann,et al.  Distinguishing universal and language-dependent levels of speech perception: Evidence from Japanese listeners' perception of English “l” and “r” , 1986, Cognition.

[38]  R. Krakow,et al.  Perception of coarticulatory nasalization by speakers of English and Thai: evidence for partial compensation. , 1999, The Journal of the Acoustical Society of America.

[39]  B H Repp,et al.  Influence of Vocalic Context on Perception of the [j [s] Distinction: Ii. Spectral Factors , 2022 .

[40]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[41]  A. Lotto,et al.  Putting phonetic context effects into context: A commentary on Fowler (2006) , 2006, Perception & psychophysics.

[42]  James D. Harnsberger,et al.  Language-specific patterns of vowel-to-vowel coarticulation: acoustic structures and their perceptual correlates , 2002, J. Phonetics.

[43]  J Kingston,et al.  The Phonetics and Phonology of Perceptually Motivated Articulatory Covariation , 1992, Language and speech.

[44]  Jonathan Barnes,et al.  Strength and weakness at the interface : positional neutralization in phonetics and phonology , 2006 .

[45]  C A Fowler,et al.  Listeners do hear sounds, not tongues. , 1996, The Journal of the Acoustical Society of America.

[46]  John R. Anderson,et al.  The Adaptive Character of Thought , 1990 .

[47]  Refractor Vision , 2000, The Lancet.