论文信息 - Bayesian Speech Production: Evidence from Latency and Hyperarticulation

Bayesian Speech Production: Evidence from Latency and Hyperarticulation

Bayesian Speech Production: Evidence from Latency and Hyperarticulation Christo Kirov(kirov@cogsci.jhu.edu) Department of Cognitive Science, 3400 N. Charles Street Baltimore, MD 21218 USA Colin Wilson (colin@cogsci.jhu.edu) Department of Cognitive Science, 3400 N. Charles Street Baltimore, MD 21218 USA Abstract the syllable UB). The findings from this task are summarized in Table 1. When the alternative response is highly similar to the originally planned target response, the time to initiate the alternative is lengthened. This effect drops off rapidly with increased phonological/phonetic distance. Only alterna- tive responses that are about one feature away from the target seem to induce a significant delay. Both response latency and phonetic variation reflect competi- tion among alternatives during the speech production process. A review of the literature finds an apparent contradiction in the latency results. In some tasks where latency is measured, similarity between targets and competitors results in slower reaction times. In other tasks, similar competitors appear to facilitate production times relative to non-similar competitors (though a lack of any competition at all results in the shortest response latencies). With respect to phonetic realization, ex- periments suggest that high levels of competition induced by sufficiently similar competitors result in hyperarticulation of target utterances. We present a Bayesian model of speech pro- duction that formalizes the selection and planning of spoken forms as noisy-channel communication among different levels of processing. The model resolves the apparent contradiction found in the latency results, and establishes a novel connection between those results and observed patterns of hyperarticula- tion. Keywords: Speech production; competition; Bayesian model- ing Table 1: Plan Switching Task: Similarity = Higher Latency Planned Alternative Difference Latency UP UB voicing high UP UT place high UP UD voicing + place low In cue-distractor tasks, on the other hand, similarity seems to play the opposite role (Gordon & Meyer, 1984; Galantucci et al., 2009; Roon, 2012). In a cue-distractor task, participants are taught to associate a visual cue with a particular verbal re- sponse (e.g., the syllable KA or GA). Upon receiving the cue, the participant attempts to produce the associated response as quickly as possible. However, before the subject is able to initiate speech (e.g., at 200ms after the cue), an auditory or visual distractor is presented (e.g., the syllable PA). In spite of the fact that the subject has been given instruc- tions to ignore the distractor, it has an effect on response la- tency as summarized in Table 2. It seems that when the dis- tractor is sufficiently similar to the target response, produc- tion is facilitated relative to the case when the distractor is at a greater distance. However, it is always the case that the presentation of a distractor, no matter how it is related to the target, results in some production delay relative to the no- distractor case. Introduction Competition among alternatives, and the need to resolve com- petition efficiently and correctly, are pervasive in speech per- ception and speech production (e.g., Luce & Pisoni, 1998; Marslen-Wilson & Zwitserlood, 1989; Dell & Gordon, 2003). A number of studies have examined how such competitive processes are reflected in the time it takes to plan speech, and in the fine-grained phonetic realization of speech sounds. The goal of this paper is to develop a unified explanation of these potentially conflicting results, which have typically been treated independently. In various speech production tasks, response latency is af- fected by the relationship between the target response and any primes, distractors, or competitors in the experimental speech environment (e.g., masked priming (Ferrand et al., 1996), plan switching (Meyer & Gordon, 1985; Yaniv et al., 1990), cue distractor tasks (Gordon & Meyer, 1984; Galantucci et al., 2009; Roon, 2012). A review of these results reveals an apparent contradiction with respect to how similarity between targets and competitors affects response latency. In some production tasks, similarity between target utter- ances and competitors results in delayed (longer) response latencies (Meyer & Gordon, 1985; Yaniv, Meyer, Gordon, Huff, & Sevald, 1990; Roelofs, 1999). One example is the plan-switching task (Meyer & Gordon, 1985), in which par- ticipants are prompted to plan to say one form (e.g., the syl- lable UP), but are sometimes cued to say an alternative (e.g., Table 2: Cue-Distractor Task: Similarity = Lower Latency Response Distractor Difference Latency KA none NA minimal KA GA voicing low KA TA place low KA DA voicing+place high Finally, high levels of competition have been shown to in- fluence phonetic realization: salient competitors in the speech environment give rise to hyperarticulation of spoken forms.

Christo Kirov | Colin Wilson | C. Wilson | Christo Kirov

[1] Louis Goldstein,et al. Perceptuomotor compatibility effects in speech , 2009, Attention, perception & psychophysics.

[2] D. Meyer,et al. Vowel similarity, connectionist models, and syllable structure in motor programming of speech , 1990 .

[3] Jason M. Brenier,et al. Predictability Effects on Durations of Content and Function Words in Conversational English , 2009 .

[4] D. Meyer,et al. Speech production: Motor programming of phonetic features , 1985 .

[5] P C Gordon,et al. Perceptual-motor processing of phonetic features in speech. , 1984, Journal of experimental psychology. Human perception and performance.

[6] A. Roelofs,et al. The WEAVER model of word-form encoding in speech production , 1997, Cognition.

[7] Christo Kirov,et al. The Specificity of Online Variation in Speech Production , 2012, CogSci.

[8] Eero P. Simoncelli,et al. Cardinal rules: Visual orientation perception reflects knowledge of environmental statistics , 2011, Nature Neuroscience.

[9] D. Norris,et al. Shortlist B: a Bayesian model of continuous speech recognition. , 2008, Psychological review.

[10] Naomi H. Feldman,et al. The influence of categories on perception: explaining the perceptual magnet effect as optimal statistical inference. , 2009, Psychological review.

[11] D. Pisoni,et al. Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[12] Gary S. Dell,et al. Neighbors in the lexicon: Friends or foes? , 2003 .

[13] Ardi Roelofs,et al. Phonological Segments and Features as Planning Units in Speech Production , 1999 .

[14] B. Munson,et al. The effect of phonological neighborhood density on vowel articulation. , 2004, Journal of speech, language, and hearing research : JSLHR.

[15] R. Wright. Phonetic Interpretation Papers in Laboratory Phonology VI: Factors of lexical competition in vowel articulation , 2004 .

[16] Stephen D. Goldinger,et al. Lexical neighborhoods in speech production: A first report , 1989 .

[17] W. Marslen-Wilson. Accessing Spoken Words : The Importance of Word Onsets , 2004 .

[18] Matthew Goldrick,et al. Mechanisms of interaction in speech production , 2009, Language and cognitive processes.

[19] Adam Binch,et al. Perception as Bayesian Inference , 2014 .

[20] Kevin D. Roon. The dynamics of phonological planning , 2013 .

[21] J. Grainger,et al. Masked Priming of Word and Picture Naming: The Role of Syllabic Units ☆ , 1996 .

[22] Adam N Sanborn,et al. Rational approximations to rational models: alternative algorithms for category learning. , 2010, Psychological review.