Cue Effectiveness in Communicatively Efficient Discourse Production

Recent years have seen a surge in accounts motivated by information theory that consider language production to be partially driven by a preference for communicative efficiency. Evidence from discourse production (i.e., production beyond the sentence level) has been argued to suggest that speakers distribute information across discourse so as to hold the conditional per-word entropy associated with each word constant, which would facilitate efficient information transfer (Genzel & Charniak, 2002). This hypothesis implies that the conditional (contextualized) probabilities of linguistic units affect speakers' preferences during production. Here, we extend this work in two ways. First, we explore how preceding cues are integrated into contextualized probabilities, a question which so far has received little to no attention. Specifically, we investigate how a cue's maximal informativity about upcoming words (the cue's effectiveness) decays as a function of the cue's recency. Based on properties of linguistic discourses as well as properties of human memory, we analytically derive a model of cue effectiveness decay and evaluate it against cross-linguistic data from 12 languages. Second, we relate the information theoretic accounts of discourse production to well-established mechanistic (activation-based) accounts: We relate contextualized probability distributions over words to their relative activation in a lexical network given preceding discourse.

[1]  Austin F. Frank,et al.  Speaking Rationally: Uniform Information Density as an Optimal Strategy for Language Production , 2008 .

[2]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[3]  Allen and Rosenbloom Paul S. Newell,et al.  Mechanisms of Skill Acquisition and the Law of Practice , 1993 .

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  M. Pickering,et al.  The Representation of Verbs: Evidence from Syntactic Priming in Language Production , 1998 .

[6]  John R. Anderson Cognitive Skills and Their Acquisition , 2013 .

[7]  Heike Wiese,et al.  Expecting the Unexpected: Exceptions in Grammar , 2011 .

[8]  Eugene Charniak,et al.  Entropy Rate Constancy in Text , 2002, ACL.

[9]  Eugene Charniak,et al.  Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number , 2003, EMNLP.

[10]  A. Baddeley The episodic buffer: a new component of working memory? , 2000, Trends in Cognitive Sciences.

[11]  J. Wixted,et al.  On the Form of Forgetting , 1991 .

[12]  Ting Qian,et al.  Topic Shift in Efficient Discourse Production , 2011, CogSci.

[13]  Mirjam Ernestus,et al.  Articulatory Planning Is Continuous and Sensitive to Informational Redundancy , 2005, Phonetica.

[14]  Susan M. Garnsey,et al.  Knowledge of Grammar, Knowledge of Usage: Syntactic Probabilities Affect Pronunciation Variation , 2004 .

[15]  R. F. Cancho,et al.  The global minima of the communicative energy of natural communication systems , 2007 .

[16]  Gaurav Malhotra,et al.  Dynamics of structural priming , 2009 .

[17]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[18]  J. Meaning , Sound , and Syntax : Lexical Priming in Sentence Production , 2001 .

[19]  John R. Anderson,et al.  The role of practice in fact retrieval. , 1985 .

[20]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Carlos Gómez Gallo,et al.  Incremental Syntactic Planning across Clauses , 2008 .

[22]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[23]  D. G. MacKay The Problems of Flexibility, Fluency, and Speed-Accuracy Trade-Off in Skilled Behavior. , 1982 .

[24]  Sunil Arya,et al.  Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.

[25]  T. Florian Jaeger,et al.  Redundancy and reduction: Speakers manage syntactic information density , 2010, Cognitive Psychology.

[26]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[27]  Gary S. Dell,et al.  Connectionist models of language production: lexical access and grammatical encoding , 1999, Cogn. Sci..

[28]  Gertraud Fenk-Oczlon Familiarity, information flow, and linguistic form , 2001 .

[29]  I JordanMichael,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2010 .

[30]  Amy Perfors,et al.  Why are some word orders more common than others? A uniform information density account , 2010, NIPS.

[31]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[32]  M. Aylett,et al.  Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. , 2006, The Journal of the Acoustical Society of America.

[33]  Jeanette K. Gundel,et al.  Cognitive Status and the Form of Referring Expressions in Discourse , 1993 .

[34]  Craige Roberts,et al.  Information Structure: Towards an integrated formal theory of pragmatics , 2012 .

[35]  Dmitrii Manin,et al.  Experiments on predictability of word in context and information rate in natural language , 2006, ArXiv.

[36]  J. Bresnan,et al.  Syntactic probabilities affect pronunciation variation in spontaneous speech , 2009, Language and Cognition.

[37]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[38]  Mirjam Ernestus,et al.  Morphological predictability and acoustic duration of interfixes in Dutch compounds. , 2007, The Journal of the Acoustical Society of America.

[39]  Tanja Schultz,et al.  Dynamic language model adaptation using variational Bayes inference , 2005, INTERSPEECH.

[40]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[41]  Jennifer E. Arnold,et al.  Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering , 2015 .

[42]  Ronald Rosenfeld,et al.  Using story topics for language model adaptation , 1997, EUROSPEECH.

[43]  Jennifer E. Arnold,et al.  The Effect of Thematic Roles on Pronoun Use and Frequency of Reference Continuation , 2001 .

[44]  Craige Roberts Information structure in discourse: Towards an integrated for-mal theory of pragmatics , 1996 .

[45]  Betty S. Phillips,et al.  Word Frequency and the Actuation of Sound Change , 1984 .

[46]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[47]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[49]  Ting Qian,et al.  Close = Relevant? The Role of Context in Efficient Language Production , 2010, CMCL@ACL.

[50]  Johanna D. Moore,et al.  Explorer A Computational Cognitive Model of Syntactic Priming , 2017 .

[51]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .

[52]  Jan P. H. van Santen,et al.  Duration and spectral balance of intervocalic consonants: A case for efficient communication , 2005, Speech Commun..

[53]  G. Dell,et al.  Becoming syntactic. , 2006, Psychological review.

[54]  Steven T Piantadosi,et al.  Word lengths are optimized for efficient communication , 2011, Proceedings of the National Academy of Sciences.

[55]  Richard L. Lewis,et al.  Computational principles of working memory in sentence comprehension , 2006, Trends in Cognitive Sciences.

[56]  G S Dell,et al.  A spreading-activation theory of retrieval in sentence production. , 1986, Psychological review.

[57]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[58]  Jennifer E. Arnold,et al.  The effect of additional characters on choice of referring expression: Everyone counts. , 2007, Journal of memory and language.

[59]  Frank Keller,et al.  The Entropy Rate Principle as a Predictor of Processing Effort: An Evaluation against Eye-tracking Data , 2004, EMNLP.

[60]  Alice Turk,et al.  The Smooth Signal Redundancy Hypothesis: A Functional Explanation for Relationships between Redundancy, Prosodic Prominence, and Duration in Spontaneous Speech , 2004, Language and speech.

[61]  Victor S Ferreira,et al.  Given-New Ordering Effects on the Production of Scrambled Sentences in Japanese , 2003, Journal of psycholinguistic research.

[62]  Roger Levy,et al.  Speakers optimize information density through syntactic reduction , 2006, NIPS.

[63]  Jay I. Myung,et al.  Assessing the distinguishability of models and the informativeness of data , 2004, Cognitive Psychology.

[64]  T. Jaeger,et al.  Evidence for Efficient Language Production in Chinese , 2009 .

[65]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.

[66]  Jason M. Brenier,et al.  Predictability Effects on Durations of Content and Function Words in Conversational English , 2009 .

[67]  John R. Anderson,et al.  Representation and retention of verbatim information , 1977 .

[68]  I. Cuthill,et al.  Survey of the Quality of Experimental Design, Statistical Analysis and Reporting of Research Using Animals , 2009, PloS one.

[69]  Ramon Ferrer i Cancho,et al.  Decoding least effort and scaling in signal frequency distributions , 2005 .

[70]  Ramon Ferrer The global minima of the communicative energy of natural communication systems , 2007 .

[71]  R. Ferrer i Cancho,et al.  Zipf's law from a communicative phase transition , 2005 .

[72]  S. Piantadosi,et al.  Refer efficiently : Use less informative expressions for more predictable meanings , 2009 .

[73]  L. Squire,et al.  On the course of forgetting in very long-term memory. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[74]  S. Brennan Centering Attention in Discourse. , 1995 .

[75]  Willem J. M. Levelt,et al.  A theory of lexical access in speech production , 1999, Behavioral and Brain Sciences.

[76]  J. K. Bock Syntactic persistence in language production , 1986, Cognitive Psychology.

[77]  Richard L. Lewis,et al.  An Activation-Based Model of Sentence Processing as Skilled Memory Retrieval , 2005, Cogn. Sci..

[78]  W J Levelt,et al.  Spoken word production: A theory of lexical access , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Joan L. Bybee,et al.  Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change , 2002, Language Variation and Change.

[80]  Louis C. W. Pols,et al.  How efficient is speech , 2003 .

[81]  Martin J Pickering,et al.  The use of visual context during the production of referring expressions , 2010, Quarterly journal of experimental psychology.

[82]  Ardi Roelofs,et al.  A Case for Nondecomposition in Conceptually Driven Word Retrieval , 1997 .

[83]  Scott Weinstein,et al.  Providing a Unified Account of Definite Noun Phrases in Discourse , 1983, ACL.

[84]  Adilson E. Motter,et al.  Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words , 2009, PloS one.

[85]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[86]  Kathryn Bock,et al.  Syntactic effects of information availability in sentence production , 1980 .

[87]  Ardi Roelofs Syllable structure effects turn out to be word length effects: Comment on Santiago et al. (2000) , 2002 .

[88]  David I. Beaver,et al.  Lexical Variation in Relativizer Frequency , 2009 .