The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular Tasks

Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years. Extensive work shows how accurately such models can represent abstract, semantic information present in text. In this expository work, we explore a tangent direction and analyze such models' performance on tasks that require a more granular level of representation. We focus on the problem of textual similarity from two perspectives: matching documents on a granular level (requiring embeddings to capture fine-grained attributes in the text), and an abstract level (requiring embeddings to capture overall textual semantics). We empirically demonstrate, across two datasets from different domains, that despite high performance in abstract document matching as expected, contextual embeddings are consistently (and at times, vastly) outperformed by simple baselines like TF-IDF for more granular tasks. We then propose a simple but effective method to incorporate TF-IDF into models that use contextual embeddings, achieving relative improvements of up to 36% on granular tasks.

[1]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[2]  Kees van Deemter,et al.  What do you mean, BERT? Assessing BERT as a Distributional Semantics Model , 2019, ArXiv.

[3]  Cheng Li,et al.  Semantic Text Matching for Long-Form Documents , 2019, WWW.

[4]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[5]  Hinrich Schütze,et al.  Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity , 2020, ACL.

[6]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[7]  Noah A. Smith,et al.  Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.

[8]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[9]  Dave Carter,et al.  Online Near-Duplicate Detection of News Articles , 2020, LREC.

[10]  Alexander Löser,et al.  How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations , 2019, CIKM.

[11]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[12]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[13]  Pradeep Ravikumar,et al.  Word Mover’s Embedding: From Word2Vec to Document Embedding , 2018, EMNLP.

[14]  Jinjun Xiong,et al.  Document Similarity for Texts of Varying Lengths via Hidden Topics , 2018, ACL.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[17]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Norman Meuschke,et al.  news-please - A Generic News Crawler and Extractor , 2017, ISI.

[20]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[21]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[22]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Gregor Wiedemann,et al.  Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.

[25]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[26]  Yu Xu,et al.  Matching Article Pairs with Graphical Decomposition and Convolutions , 2018, ACL.

[27]  Maria Liakata,et al.  tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection , 2020, ACL.

[28]  Matei Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[29]  Luyao Huang,et al.  Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence , 2019, NAACL.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Serge Demeyer,et al.  The Eclipse and Mozilla defect tracking dataset: A genuine dataset for mining bug information , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).