Revisiting the Uniform Information Density Hypothesis

The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal. While its implications on language production have been well explored, the hypothesis potentially makes predictions about language comprehension and linguistic acceptability as well. Further, it is unclear how uniformity in a linguistic signal—or lack thereof—should be measured, and over which linguistic unit, e.g., the sentence or language level, this uniformity should hold. Here we investigate these facets of the UID hypothesis using reading time and acceptability data. While our reading time results are generally consistent with previous work, they are also consistent with a weakly super-linear effect of surprisal, which would be compatible with UID’s predictions. For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability. We then explore multiple operationalizations of UID, motivated by different interpretations of the original hypothesis, and analyze the scope over which the pressure towards uniformity is exerted. The explanatory power of a subset of the proposed operationalizations suggests that the strongest trend may be a regression towards a mean surprisal across the language, rather than the phrase, sentence, or document—a finding that supports a typical interpretation of UID, namely that it is the byproduct of language users maximizing the use of a (hypothetical) communication channel.1

[1]  T. Florian Jaeger,et al.  Redundancy and reduction: Speakers manage syntactic information density , 2010, Cognitive Psychology.

[2]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[3]  Eugene Charniak,et al.  Entropy Rate Constancy in Text , 2002, ACL.

[4]  Ryan Cotterell,et al.  If Beam Search Is the Answer, What Was the Question? , 2020, EMNLP.

[5]  Wouter Duyck,et al.  Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence reading , 2017, Behavior research methods.

[6]  Clara Meister,et al.  A Cognitive Regularizer for Language Modeling , 2021, ACL.

[7]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[8]  Alexander Clark,et al.  Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge , 2017, Cogn. Sci..

[9]  Michael Xavier Collins Information Density and Dependency Length as Complementary Cognitive Models , 2014, Journal of psycholinguistic research.

[10]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[11]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[12]  Nathaniel J. Smith,et al.  The effect of word predictability on reading time is logarithmic , 2013, Cognition.

[13]  Gabriella Vigliocco,et al.  Word surprisal predicts N400 amplitude during reading , 2013, ACL.

[14]  Christopher D. Manning,et al.  Probabilistic models of word order and syntactic discontinuity , 2005 .

[15]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[16]  Gabriella Vigliocco,et al.  Lexical surprisal as a general predictor of reading time , 2012, EACL.

[17]  Frank Keller,et al.  Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure , 2010, ACL.

[18]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[19]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[20]  Richard Futrell,et al.  The Natural Stories Corpus , 2017, LREC.

[21]  Sumeet Agarwal,et al.  Uniform Information Density Effects on Syntactic Choice in Hindi , 2018 .

[22]  R.C. Schaefer,et al.  Good as new , 2005, IEEE Industry Applications Magazine.

[23]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[24]  S. Frank,et al.  Insensitivity of the Human Sentence-Processing System to Hierarchical Structure , 2011, Psychological science.

[25]  D. Speelman,et al.  Comparing explanations for the Complexity Principle: evidence from argument realization , 2018, Language and Cognition.

[26]  Roger Levy,et al.  Speakers optimize information density through syntactic reduction , 2006, NIPS.

[27]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[28]  F. Pellegrino,et al.  Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche , 2019, Science Advances.

[29]  Alice Turk,et al.  The Smooth Signal Redundancy Hypothesis: A Functional Explanation for Relationships between Redundancy, Prosodic Prominence, and Duration in Spontaneous Speech , 2004, Language and speech.

[30]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[31]  Elizabeth Salesky,et al.  A surprisal–duration trade-off across and within the world’s languages , 2021, EMNLP.

[32]  Roger Levy,et al.  Availability-Based Production Predicts Speakers' Real-time Choices of Mandarin Classifiers , 2019, CogSci.

[33]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[34]  T. Jaeger,et al.  Proceedings of the Annual Meeting of the Cognitive Science Society , 2008 .

[35]  Jelke Bloem,et al.  Testing the Processing Hypothesis of word order variation using a probabilistic language model , 2016, CL4LC@COLING 2016.

[36]  John Hoeks,et al.  Modeling the Noun Phrase versus Sentence Coordination Ambiguity in Dutch: Evidence from Surprisal Theory , 2010, CMCL@ACL.

[37]  Adam Goodkind,et al.  Predictive power of word surprisal for reading times is a linear function of language model quality , 2018, CMCL.

[38]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[39]  Matthew W. Crocker,et al.  Information density of encodings: The role of syntactic variation in comprehension , 2017, CogSci.

[40]  Roger Levy,et al.  On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior , 2020, CogSci.

[41]  Meilin Zhan,et al.  Comparing Theories of Speaker Choice Using a Model of Classifier Production in Mandarin Chinese , 2018, NAACL.

[42]  Sascha Topolinski,et al.  The architecture of intuition: Fluency and affect determine intuitive judgments of semantic and visual coherence and judgments of grammaticality in artificial grammar learning. , 2009, Journal of experimental psychology. General.

[43]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[44]  Steven G. Luke,et al.  The Provo Corpus: A large eye-tracking corpus with predictability norms , 2018, Behavior research methods.

[45]  G. Kuperberg,et al.  Word predictability effects are linear, not logarithmic: Implications for probabilistic models of sentence comprehension. , 2021, Journal of memory and language.

[46]  Vera Demberg,et al.  Uniform Surprisal at the Level of Discourse Relations: Negation Markers and Discourse Connective Omission , 2015, IWCS.

[47]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[48]  G. Crooks On Measures of Entropy and Information , 2015 .

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  François Christophe Egidio Pellegrino,et al.  Across-Language Perspective on Speech Information Rate , 2011 .

[51]  S. Piantadosi,et al.  Info/information theory: Speakers choose shorter words in predictive contexts , 2013, Cognition.

[52]  Yohei Oseki,et al.  Lower Perplexity is Not Always Human-Like , 2021, ACL/IJCNLP.

[53]  Roger Levy,et al.  Communicative Efficiency, Uniform Information Density, and the Rational Speech Act Theory , 2018, CogSci.