Speakers optimize information density through syntactic reduction

If language users are rational, they might choose to structure their utterances so as to optimize communicative properties. In particular, information-theoretic and psycholinguistic considerations suggest that this may include maximizing the uniformity of information density in an utterance. We investigate this possibility in the context of syntactic reduction, where the speaker has the option of either marking a higher-order unit (a phrase) with an extra word, or leaving it unmarked. We demonstrate that speakers are more likely to reduce less information-dense phrases. In a second step, we combine a stochastic model of structured utterance production with a logistic-regression model of syntactic reduction to study which types of cues speakers employ when estimating the predictability of upcoming elements. We demonstrate that the trend toward predictability-sensitive syntactic reduction (Jaeger, 2006) is robust in the face of a wide variety of control variables, and present evidence that speakers use both surface and structural cues for predictability estimation.

[1]  Frank Keller,et al.  The Entropy Rate Principle as a Predictor of Processing Effort: An Evaluation against Eye-tracking Data , 2004, EMNLP.

[2]  Alice Turk,et al.  The Smooth Signal Redundancy Hypothesis: A Functional Explanation for Relationships between Redundancy, Prosodic Prominence, and Duration in Spontaneous Speech , 2004, Language and speech.

[3]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[4]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[5]  J. Elman,et al.  Why is that? Structural prediction and ambiguity resolution in a very large corpus of English sentences , 2006, Cognition.

[6]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[7]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[8]  Christopher D. Manning,et al.  Probabilistic models of word order and syntactic discontinuity , 2005 .

[9]  T. Florian Jaeger,et al.  Optional that indicates production difficulty: evidence from disfluencies , 2005, DiSS.

[10]  Jennifer E. Arnold RUNNING HEAD : AVOIDING ATTACHMENT AMBIGUITIES Avoiding Attachment Ambiguities : the Role of Constituent Ordering , 2004 .

[11]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[13]  Matthew P. Aylett,et al.  Stochastic suprasegmentals: relationships between redundancy, prosodic structure and care of articulation in spontaneous speech , 2000, INTERSPEECH.

[14]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[15]  Eugene Charniak,et al.  Entropy Rate Constancy in Text , 2002, ACL.