Assessing Phrasal Representation and Composition in Transformers

Deep transformer models have pushed performance on NLP tasks to new limits, suggesting sophisticated treatment of complex linguistic inputs, such as phrases. However, we have limited understanding of how these models handle representation of phrases, and whether this reflects sophisticated composition of phrase meaning like that done by humans. In this paper, we present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers. We use tests leveraging human judgments of phrase similarity and meaning shift, and compare results before and after control of word overlap, to tease apart lexical effects versus composition effects. We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition. We also identify variations in phrase representation quality across models, layers, and representation types, and make corresponding recommendations for usage of representations from these models.

[1]  Bin Yu,et al.  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[2]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[3]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[4]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[5]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[6]  Jason Baldridge,et al.  PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[7]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[8]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[9]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[10]  Noah D. Goodman,et al.  Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[11]  Roger Levy,et al.  Neural language models as psycholinguistic subjects: Representations of syntactic state , 2019, NAACL.

[12]  Luke S. Zettlemoyer,et al.  Dissecting Contextual Word Embeddings: Architecture and Representation , 2018, EMNLP.

[13]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[14]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[15]  Roger Levy,et al.  What do RNN Language Models Learn about Filler–Gap Dependencies? , 2018, BlackboxNLP@EMNLP.

[16]  Tom M. Mitchell,et al.  A Compositional and Interpretable Semantic Space , 2015, NAACL.

[17]  Allyson Ettinger,et al.  Spying on Your Neighbors: Fine-grained Probing of Contextual Embeddings for Information about Surrounding Words , 2020, ACL.

[18]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Ido Dagan,et al.  Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition , 2019, TACL.

[21]  Saif Mohammad,et al.  Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition , 2019, NAACL.

[22]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[23]  Willem Zuidema,et al.  Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment , 2019, CoNLL.

[24]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[25]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[26]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[27]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[28]  Allyson Ettinger,et al.  Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[29]  Anand Singh,et al.  Learning compositionally through attentive guidance , 2018, ArXiv.

[30]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Marco Baroni,et al.  Memorize or generalize? Searching for a compositional RNN in a haystack , 2018, ArXiv.

[33]  Walter Kintsch,et al.  Predication , 2001, Cogn. Sci..

[34]  Allyson Ettinger,et al.  Assessing Composition in Sentence Vector Representations , 2018, COLING.

[35]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[36]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[37]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[38]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[39]  Colin Raffel,et al.  How Much Knowledge Can You Pack Into the Parameters of a Language Model? , 2020, EMNLP.

[40]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[41]  Yonatan Belinkov,et al.  Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[42]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[43]  Terry Regier,et al.  Does BERT agree? Evaluating knowledge of structure dependence through agreement relations , 2019, ArXiv.

[44]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[45]  Adam Lopez,et al.  LSTMS Compose — and Learn — Bottom-Up , 2020, FINDINGS.

[46]  Tao Meng,et al.  SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics , 2020, ACL.

[47]  Douwe Kiela,et al.  SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[48]  Timothy Baldwin,et al.  How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions , 2019, Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for.

[49]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[50]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[51]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[52]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[53]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[54]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[55]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[56]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[57]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[58]  Dennis Ulmer,et al.  On the Realization of Compositionality in Neural Networks , 2019, BlackboxNLP@ACL.

[59]  S. A. Chowdhury,et al.  RNN Simulations of Grammaticality Judgments on Long-distance Dependencies , 2018, COLING.

[60]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[61]  Adam Lopez,et al.  Word Interdependence Exposes How LSTMs Compose Representations , 2020, ArXiv.

[62]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.