Implicit Representations of Meaning in Neural Language Models

Does the effectiveness of neural language models derive entirely from accurate modeling of surface word co-occurrence statistics, or do these models represent and reason about the world they describe? In BART and T5 transformer language models, we identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support a linear readout of each entity’s current properties and relations, and can be manipulated with predictable effects on language generation. Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state, and that this behavior can be learned with only text as training data.1

[1]  Percy Liang,et al.  Simpler Context-Dependent Logical Forms via Model Projections , 2016, ACL.

[2]  Jeroen Groenendijk,et al.  Dynamic predicate logic , 1991 .

[3]  Luke S. Zettlemoyer,et al.  Improving Semantic Parsing for Task Oriented Dialog , 2019, ArXiv.

[4]  Felix Hill,et al.  Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text , 2020, ArXiv.

[5]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[6]  Gregor Wiedemann,et al.  Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.

[7]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[8]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[9]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[10]  Seth Yalcin,et al.  Introductory notes on dynamic semantics ⇤ , 2014 .

[11]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[12]  Noah A. Smith,et al.  Infusing Finetuning with Semantic Dependencies , 2020, Transactions of the Association for Computational Linguistics.

[13]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[14]  Christopher Potts,et al.  Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation , 2020, BLACKBOXNLP.

[15]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[16]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[17]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[18]  Emily M. Bender,et al.  Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.

[19]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[20]  Ali Farhadi,et al.  Probing Contextual Language Models for Common Ground with Visual Representations , 2020, NAACL.

[21]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[22]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[23]  Irene Heim,et al.  File Change Semantics and the Familiarity Theory of Definiteness , 2008 .

[24]  Ryan Cotterell,et al.  Pareto Probing: Trading-Off Accuracy and Complexity , 2020, EMNLP.

[25]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[26]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[27]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[28]  Christopher Potts,et al.  Bringing Machine Learning and Compositional Semantics Together , 2015 .

[29]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.