Context Limitations Make Neural Language Models More Human-Like

Language models (LMs) have been used in cognitive modeling as well as engineering studies—they compute information-theoretic complexity metrics that simulate humans’ cognitive load during reading.This study highlights a limitation of modern neural LMs as the model of choice for this purpose: there is a discrepancy between their context access capacities and that of humans.Our results showed that constraining the LMs’ context access improved their simulation of human reading behavior.We also showed that LM-human gaps in context access were associated with specific syntactic constructions; incorporating syntactic biases into LMs’ context access might enhance their cognitive plausibility.

[1]  Lukas Galke,et al.  Emergent Communication for Understanding Human Language Evolution: What's Missing? , 2022, ArXiv.

[2]  Mohit Iyyer,et al.  Do Long-Range Language Models Actually Use Long-Range Context? , 2021, EMNLP.

[3]  Hiroshi Noji,et al.  Modeling Human Sentence Processing with Left-Corner Recurrent Neural Network Grammars , 2021, EMNLP.

[4]  Benjamin K. Bergen,et al.  Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude? , 2021, CogSci.

[5]  Jacob Andreas,et al.  What Context Features Can Transformer Language Models Use? , 2021, ACL.

[6]  Kentaro Inui,et al.  Lower Perplexity is Not Always Human-Like , 2021, ACL.

[7]  Rahma Chaabouni,et al.  “LazImpa”: Lazy and Impatient neural agents learn to communicate efficiently , 2020, CONLL.

[8]  Robert Frank,et al.  Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling , 2020, CMCL.

[9]  Richard Roger P. Edward Futrell,et al.  Dependency locality as an explanatory principle for word order , 2020, Language.

[10]  Roger Levy,et al.  On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior , 2020, CogSci.

[11]  S. Frank,et al.  Human Sentence Processing: Recurrence or Attention? , 2020, CMCL.

[12]  Tal Linzen,et al.  How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.

[13]  Richard Futrell,et al.  Lossy‐Context Surprisal: An Information‐Theoretic Model of Memory Effects in Sentence Processing , 2020, Cogn. Sci..

[14]  Richard Futrell,et al.  Universals of word order reflect optimization of grammars for efficient communication , 2020, Proceedings of the National Academy of Sciences.

[15]  Roger Levy,et al.  Linking artificial and human neural representations of language , 2019, EMNLP.

[16]  E. Gibson,et al.  How Efficiency Shapes Human Language , 2019, Trends in Cognitive Sciences.

[17]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[18]  Masayuki Asahara,et al.  UD-Japanese BCCWJ: Universal Dependencies Annotation for the Balanced Corpus of Contemporary Written Japanese , 2018, UDW@EMNLP.

[19]  Stefan Frank,et al.  Comparing Gated and Simple Recurrent Neural Network Architectures as Models of Human Sentence Processing , 2018, CogSci.

[20]  John Hale,et al.  Finding syntax in human encephalography with beam search , 2018, ACL.

[21]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Roger Levy,et al.  Noisy-context surprisal as a human sentence processing cost model , 2017, EACL.

[24]  Masayuki Asahara,et al.  Reading-Time Annotations for “Balanced Corpus of Contemporary Written Japanese” , 2016, COLING.

[25]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[26]  Emmanuel Dupoux,et al.  Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.

[27]  Mark Steedman,et al.  Assessing Relative Sentence Complexity using an Incremental CCG Parser , 2016, NAACL.

[28]  Shravan Vasishth,et al.  Cross-linguistic differences in processing double-embedded relative clauses: Working-memory constraints or language statistics? , 2016, CogSci.

[29]  Alexandra Birch,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[30]  Narayanan Srinivasan,et al.  Strong Expectations Cancel Locality Effects: Evidence from Hindi , 2014, PloS one.

[31]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[32]  K. Maekawa Balanced Corpus of Contemporary Written Japanese , 2008, IJCNLP.

[33]  Nathaniel J. Smith,et al.  The effect of word predictability on reading time is logarithmic , 2013, Cognition.

[34]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[35]  William Schuler,et al.  A Model of Language Processing as Hierarchic Sequential Prediction , 2013, Top. Cogn. Sci..

[36]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[37]  Roger Levy,et al.  Sequential vs. Hierarchical Syntactic Models of Human Incremental Sentence Processing , 2012, CMCL@NAACL-HLT.

[38]  Richard L. Lewis,et al.  Short-term forgetting in sentence comprehension: Crosslinguistic evidence from verb-final structures , 2010 .

[39]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[40]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[41]  Richard L. Lewis,et al.  Computational principles of working memory in sentence comprehension , 2006, Trends in Cognitive Sciences.

[42]  Richard L. Lewis,et al.  An Activation-Based Model of Sentence Processing as Skilled Memory Retrieval , 2005, Cogn. Sci..

[43]  Noam Chomsky Three Factors in Language Design , 2005, Linguistic Inquiry.

[44]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[45]  N. Cowan The magical number 4 in short-term memory: A reconsideration of mental storage capacity , 2001, Behavioral and Brain Sciences.

[46]  L Konieczny,et al.  Locality and Parsing Complexity , 2000, Journal of psycholinguistic research.

[47]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[48]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[49]  John A. Hawkins,et al.  A Performance Theory of Order and Constituency , 1995 .

[50]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[51]  Why Does Surprisal From Smaller GPT-2 Models Provide Better Fit to Human Reading Times? , 2022 .

[52]  William Schuler,et al.  Surprisal Estimators for Human Reading Times Need Character Models , 2021, ACL.

[53]  Koki Washio,et al.  On the Relationship between Zipf’s Law of Abbreviation and Interfering Noise in Emergent Languages , 2021, ACL.

[54]  Richard Futrell,et al.  Modeling word and morpheme order in natural language as an efficient trade-off of memory and surprisal. , 2020, Psychological review.

[55]  Eghbal A. Hosseini,et al.  The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing , 2020 .

[56]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[57]  Adam Goodkind,et al.  Predictive power of word surprisal for reading times is a linear function of language model quality , 2018, CMCL.

[58]  Maria Barrett,et al.  The Dundee Treebank , 2015 .

[59]  M. Crocker Computational Psycholinguistics , 2009 .

[60]  Edward Gibson,et al.  Distinguishing theories of syntactic expectation cost in sentence comprehension: evidence from Japanese , 2008 .

[61]  Taku Kudo,et al.  MeCab : Yet Another Part-of-Speech and Morphological Analyzer , 2005 .

[62]  E. Gibson The dependency locality theory: A distance-based theory of linguistic complexity. , 2000 .

[63]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .