Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models

Language is crucial for human intelligence, but what exactly is its role? We take language to be a part of a system for understanding and communicating about situations. In humans, these abilities emerge gradually from experience and depend on domain-general principles of biological neural networks: connection-based learning, distributed representation, and context-sensitive, mutual constraint satisfaction-based processing. Current artificial language processing systems rely on the same domain general principles, embodied in artificial neural networks. Indeed, recent progress in this field depends on query-based attention, which extends the ability of these systems to exploit context and has contributed to remarkable breakthroughs. Nevertheless, most current models focus exclusively on language-internal tasks, limiting their ability to perform tasks that depend on understanding situations. These systems also lack memory for the contents of prior situations outside of a fixed contextual span. We describe the organization of the brain’s distributed understanding system, which includes a fast learning system that addresses the memory problem. We sketch a framework for future models of understanding drawing equally on cognitive neuroscience and artificial intelligence and exploiting query-based attention. We highlight relevant current directions and consider further developments needed to fully capture human-level language understanding in a computational system.

[1]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[2]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[3]  B. Milner,et al.  Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H.M.☆ , 1968 .

[4]  Noam Chomsky,et al.  Deep structure, surface structure, and semantic interpretation , 1969 .

[5]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[6]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.

[7]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[8]  J. Bransford,et al.  Sentence memory: A constructive versus interpretive approach ☆ ☆☆ , 1972 .

[9]  Marcia K. Johnson,et al.  Contextual prerequisites for understanding: Some investigations of comprehension and recall , 1972 .

[10]  Richard Montague,et al.  The Proper Treatment of Quantification in Ordinary English , 1973 .

[11]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[12]  G. Lakoff,et al.  Metaphors We Live By , 1980 .

[13]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[14]  C SchankRoger,et al.  Dynamic Memory: A Theory of Reminding and Learning in Computers and People , 1983 .

[15]  J. Fodor The Modularity of mind. An essay on faculty psychology , 1986 .

[16]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[17]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[18]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[19]  James L. McClelland,et al.  Constituent Attachment and Thematic Role Assignment in Sentence Processing: Influences of Content-Based Expectations , 1988 .

[20]  S. Pinker,et al.  Connections and symbols , 1988 .

[21]  N. Cohen,et al.  The impaired learning of semantic knowledge following bilateral medial temporal-lobe resection , 1988, Brain and Cognition.

[22]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[23]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[24]  B. MacWhinney,et al.  Implementations are not conceptualizations: Revising the verb learning model , 1991, Cognition.

[25]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[26]  D. Rumelhart Metaphor and Thought: Some problems with the notion of literal meanings , 1993 .

[27]  David E. Rumelhart,et al.  Toward an interactive model of reading. , 1994 .

[28]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[29]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[30]  L. Steels,et al.  Grounding adaptive language games in robotic agents , 2006, AAAI 2012.

[31]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[32]  James L. McClelland,et al.  Structure and deterioration of semantic memory: a neuropsychological and computational investigation. , 2004, Psychological review.

[33]  J. Feldman,et al.  Embodied meaning in a neural theory of language , 2004, Brain and Language.

[34]  J. Bryson Embodiment vs . Memetics , 2004 .

[35]  James L. McClelland,et al.  Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition , 2005 .

[36]  G. Altmann,et al.  The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing , 2007 .

[37]  T. Rogers,et al.  Where do you know what you know? The representation of semantic knowledge in the human brain , 2007, Nature Reviews Neuroscience.

[38]  Joanna J. Bryson,et al.  Embodiment versus memetics , 2008 .

[39]  Peter Hagoort,et al.  When Elephants Fly: Differential Sensitivity of Right and Left Inferior Frontal Gyri to Discourse and World Knowledge , 2009, Journal of Cognitive Neuroscience.

[40]  J. Baldo,et al.  Is relational reasoning dependent on language? A voxel-based lesion symptom mapping study , 2010, Brain and Language.

[41]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[42]  C. Ranganath,et al.  Two cortical systems for memory-guided behaviour , 2012, Nature Reviews Neuroscience.

[43]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[44]  James L. McClelland,et al.  Interactive Activation and Mutual Constraint Satisfaction in Perception and Cognition , 2014, Cogn. Sci..

[45]  David C. Plaut,et al.  Quasiregularity and Its Discontents: The Legacy of the Past Tense Debate , 2014, Cogn. Sci..

[46]  Joshua B. Tenenbaum,et al.  Phrase similarity in humans and machines , 2015, CogSci.

[47]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[48]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[49]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[50]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[51]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[52]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[53]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[54]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[55]  Rajarshi Das,et al.  Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks , 2017, ACL.

[56]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[57]  Jonathan W. Pillow,et al.  Discovering Event Structure in Continuous Narrative Perception and Memory , 2016, Neuron.

[58]  Yuan Chang Leong,et al.  How We Transmit Memories to Other Brains: Constructing Shared Neural Representations Via Communication , 2016, bioRxiv.

[59]  Roel M. Willems,et al.  Grounding the neurobiology of language in first principles: The necessity of non-language-centric explanations for language comprehension , 2018, Cognition.

[60]  James L. McClelland,et al.  Modelling the N400 brain potential as change in a probabilistic representation of meaning , 2018, Nature Human Behaviour.

[61]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[62]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[63]  Cordelia Schmid,et al.  Learning Video Representations using Contrastive Bidirectional Transformer , 2019 .

[64]  Arne D. Ekstrom,et al.  A contextual binding theory of episodic memory: systems consolidation reconsidered , 2019, Nature Reviews Neuroscience.

[65]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[66]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[67]  Morgan D. Barense,et al.  Temporal integration of narrative information in a hippocampal amnesic patient , 2019, NeuroImage.

[68]  Omer Levy,et al.  Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[69]  Andrew K. Lampinen,et al.  Integration of new information in memory: new insights from a complementary learning systems perspective , 2020, Philosophical Transactions of the Royal Society B.

[70]  Marco Baroni,et al.  Syntactic Structure from Deep Learning , 2020, Annual Review of Linguistics.

[71]  Bruce L. McNaughton,et al.  Integration of New Information in Memory: New Insights from a Complementary Learning Systems Perspective , 2020, bioRxiv.

[72]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[73]  Christopher J. Honey,et al.  Temporal integration of narrative information in a hippocampal amnesic patient , 2020, NeuroImage.

[74]  J. Weston,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[75]  James L. McClelland,et al.  Environmental drivers of systematicity and generalization in a situated agent , 2019, ICLR.

[76]  Peter Hagoort,et al.  Word contexts enhance the neural representation of individual letters in early visual cortex , 2020, Nature Communications.

[77]  Jacob Andreas,et al.  Experience Grounds Language , 2020, EMNLP.

[78]  David,et al.  IA On Learning the Past Tenses of English Verbs , 2021 .