Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation

In this paper we present a comparison between the linguistic knowledge encoded in the internal representations of a contextual Language Model (BERT) and a contextual-independent one (Word2vec). We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that, although BERT is capable of understanding the full context of each word in an input sequence, the implicit knowledge encoded in its aggregated sentence representations is still comparable to that of a contextual-independent model. We also find that BERT is able to encode sentence-level properties even within single-word embeddings, obtaining comparable or even superior results than those obtained with sentence representations.

[1]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[2]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[3]  Felice Dell'Orletta,et al.  Profiling-UD: a Tool for Linguistic Profiling of Texts , 2020, LREC.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[6]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[7]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[8]  Cristina Bosco,et al.  PartTUT: The Turin University Parallel Treebank , 2015, Italian Natural Language Processing within the PARLI Project.

[9]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[10]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[11]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[12]  Omer Levy,et al.  Deep RNNs Encode Soft Hierarchical Syntax , 2018, ACL.

[13]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[14]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[15]  Guoyin Wang,et al.  Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms , 2018, ACL.

[16]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[19]  Shikha Bordia,et al.  Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[20]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[21]  Robert Frank,et al.  Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.

[22]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[23]  Amir Zeldes,et al.  The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[24]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[25]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[26]  Samuel R. Bowman,et al.  A Gold Standard Dependency Corpus for English , 2014, LREC.

[27]  Christian S. Perone,et al.  Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[28]  Grzegorz Chrupala,et al.  Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[29]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[30]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[31]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[32]  Cecile Paris,et al.  A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics , 2019, BioNLP@ACL.

[33]  Yonatan Belinkov,et al.  Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.