Prosody and Intention Recognition

Listeners face multiple challenges in mapping prosody onto intentions: The relevant intentions vary with the general context of an utterance (e.g., the speaker’s goals) and how prosodic contours are realized varies across speakers, accents, and speech conditions. We propose that listeners map acoustic information onto prosodic representations using (rational) probabilistic inference, in the form of generative models, which are updated on the fly based on the match between predictions and the input. We review some ongoing work, motivated by this framework, focusing on the “It looks like an X” construction, which, depending on the pitch contour and context, can be interpreted as “It looks like an X and it is” or “It looks like an X and it isn’t.” We use this construction to investigate the hypothesis that pragmatic processing shows the pattern of adaptation effects that is expected if the mapping of speech onto intentions involves rational inference.

[1]  M. Tanenhaus,et al.  The effects of common ground and perspective on domains of referential interpretation , 2003 .

[2]  M. Tanenhaus,et al.  The role of perspective in identifying domains of reference , 2008, Cognition.

[3]  E. Markman,et al.  Appearance questions can be misleading: A discourse-based account of the appearance–reality problem , 2005, Cognitive Psychology.

[4]  Alex B. Fine,et al.  Evidence for Implicit Learning in Syntactic Comprehension , 2013, Cogn. Sci..

[5]  W. Ganong Phonetic categorization in auditory word perception. , 1980, Journal of experimental psychology. Human perception and performance.

[6]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[7]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[8]  B. McMurray,et al.  What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations. , 2011, Psychological review.

[9]  Kiwako Ito,et al.  Anticipatory effects of intonation: Eye movements during instructed visual search. , 2008, Journal of memory and language.

[10]  Chigusa Kurumada,et al.  Rapid adaptation in online pragmatic interpretation of contrastive prosody , 2014, CogSci.

[11]  Alex B. Fine,et al.  A belief-updating model of adaptation and cue combination in syntactic comprehension , 2012, CogSci.

[12]  P. Kuhl,et al.  Acoustic determinants of infant preference for motherese speech , 1987 .

[13]  Daniel F. Pontillo,et al.  Is it or isn’t it: Listeners make rapid use of prosody to infer speaker meanings , 2014, Cognition.

[14]  Interactive use of lexical information in speech perception. , 1987 .

[15]  Ting Qian,et al.  Rapid Expectation Adaptation during Syntactic Comprehension , 2013, PloS one.

[16]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[17]  T. Florian Jaeger,et al.  A Bayesian Belief Updating Model of Phonetic Recalibration and Selective Adaptation , 2011, CMCL@ACL.

[18]  D. Norris,et al.  Perceptual learning in speech , 2003, Cognitive Psychology.

[19]  A. Samuel,et al.  Generalization in perceptual learning for speech , 2006, Psychonomic bulletin & review.

[20]  Niels Taatgen,et al.  Proceedings of the 36th annual meeting of the cognitive science society , 2014 .

[21]  Chigusa Kurumada,et al.  Pragmatic interpretation of contrastive prosody: It looks like speech adaptation , 2012, CogSci.

[22]  Andrea Weber,et al.  Finding Referents in Time: Eye-Tracking Evidence for the Role of Contrastive Accents , 2006, Language and speech.

[23]  Michael K. Tanenhaus,et al.  Linguistic Variability and Adaptation in Quantifier Meanings , 2013, CogSci.

[24]  Duane G. Watson,et al.  Accent detection is a slippery slope: Direction and rate of F0 change drives listeners' comprehension , 2010, Language and cognitive processes.

[25]  R. Jacobs,et al.  Perception of speech reflects optimal use of probabilistic speech cues , 2008, Cognition.

[26]  J. L. Miller,et al.  A distinction between the effects of sentential speaking rate and semantic congruity on word identification , 1984, Perception & psychophysics.

[27]  Michael K. Tanenhaus,et al.  Real-time expectations based on context speech rate can cause words to appear or disappear , 2012, CogSci.

[28]  H. H. Clark Arenas of language use , 1993 .

[29]  Julia Hirschberg,et al.  Implicating Uncertainty: The Pragmatics of Fall-Rise Intonation , 1985 .

[30]  Michael K. Tanenhaus,et al.  Interpreting Pitch Accents in Online Comprehension: H* vs. L+H , 2008, Cogn. Sci..

[31]  T. Jaeger,et al.  Alignment as a consequence of expectation adaptation: Syntactic priming is affected by the prime’s prediction error given both prior and recent experience , 2013, Cognition.

[32]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[33]  Thomas A. Farmer,et al.  Prediction, explanation, and the role of generative models in language processing. , 2013, The Behavioral and brain sciences.

[34]  H. Grice Logic and conversation , 1975 .

[35]  Dave F. Kleinschmidt,et al.  Robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel. , 2015, Psychological review.

[36]  Chigusa Kurumada,et al.  Incremental processing in the pragmatic interpretation of contrastive prosody , 2013, CogSci.

[37]  Roger M. Cooper,et al.  The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. , 1974 .

[38]  Irene Heim,et al.  Semantics in generative grammar , 1998 .

[39]  James L. McClelland,et al.  Cognitive penetration of the mechanisms of perception: Compensation for coarticulation of lexically restored phonemes , 1988 .

[40]  Gary S Dell,et al.  The P-chain: relating sentence production and its disorders to comprehension and acquisition , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  Julie C. Sedivy,et al.  Achieving incremental semantic interpretation through contextual representation , 1999, Cognition.

[42]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[43]  Judith Degen,et al.  Alternatives in Pragmatic Reasoning , 2013 .

[44]  Johanna D. Moore,et al.  Proceedings of the 28th Annual Conference of the Cognitive Science Society , 2005 .

[45]  Ben R. Newell,et al.  Proceedings of the 35th Annual Meeting of the Cognitive Science Society, CogSci 2013, Berlin, Germany, July 31 - August 3, 2013 , 2013, CogSci.

[46]  Anne Pier Salverda,et al.  Interpreting prosodic cues in discourse context , 2015, Language, cognition and neuroscience.