Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve

Named Entity Recognition systems achieve remarkable performance on domains such as English news. It is natural to ask: What are these models actually learning to achieve this? Are they merely memorizing the names themselves? Or are they capable of interpreting the text and inferring the correct entity type from the linguistic context? We examine these questions by contrasting the performance of several variants of LSTM-CRF architectures for named entity recognition, with some provided only representations of the context as features. We also perform similar experiments for BERT. We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves. We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement. A system should be able to recognize any name in a predictive context correctly and our experiments indicate that current systems may be further improved by such capability.

[1]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[2]  David D. McDonald Internal and External Evidence in the Identification and Semantic Categorization of Proper Names , 1993 .

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[5]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[6]  Emmanuele Chersoni,et al.  Modeling Violations of Selectional Restrictions with Distributional Semantics , 2018, COLING 2018.

[7]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[8]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[9]  Thorsten Brants,et al.  A Context Pattern Induction Method for Named Entity Extraction , 2006, CoNLL.

[10]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[11]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[12]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[13]  J. Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  L MercerRobert,et al.  Class-based n-gram models of natural language , 1992 .

[16]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[17]  Ani Nenkova,et al.  Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models , 2020, ArXiv.

[18]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[19]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[20]  Kalina Bontcheva,et al.  Generalisation in named entity recognition: A quantitative analysis , 2017, Comput. Speech Lang..

[21]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[22]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[23]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[24]  Sriharsha Veeramachaneni,et al.  A Simple Semi-supervised Algorithm For Named Entity Recognition , 2009, HLT-NAACL 2009.

[25]  SoderlandStephen,et al.  Unsupervised named-entity extraction from the Web , 2005 .

[26]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[27]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[28]  Francesc Ribas Framis An Experiment on Learning Appropriate Selectional Restrictions From a Parsed Corpus , 1994, COLING.

[29]  Joel Nothman,et al.  Named Entity Recognition in Wikipedia , 2009, PWNLP@IJCNLP.

[30]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[31]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[32]  Xuanjing Huang,et al.  Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study , 2020, AAAI.

[33]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[34]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[37]  David Yarowsky,et al.  Language Independent NER using a Unified Model of Internal and Contextual Evidence , 2002, CoNLL.

[38]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[39]  Alexander Löser,et al.  Effective Selectional Restrictions for Unsupervised Relation Extraction , 2013, IJCNLP.