Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

Abstract In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new testbed carefully designed to diagnose Name Regularity Bias of NER models. Our results indicate that all state-of-the-art models we tested show such a bias; BERT fine-tuned models significantly outperforming feature-based (LSTM-CRF) ones on NRB, despite having comparable (sometimes lower) performance on standard benchmarks. To mitigate this bias, we propose a novel model-agnostic training method that adds learnable adversarial noise to some entity mentions, thus enforcing models to focus more strongly on the contextual signal, leading to significant gains on NRB. Combining it with two other training strategies, data augmentation and parameter freezing, leads to further gains.

[1]  Stephen D. Mayhew,et al.  ner and pos when nothing is capitalized , 2019, EMNLP.

[2]  Guandong Xu,et al.  A Boundary-aware Neural Model for Nested Named Entity Recognition , 2019, EMNLP.

[3]  Yin Zhang,et al.  Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition , 2020, EMNLP.

[4]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[5]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[6]  Bettina Berendt,et al.  RobBERT: a Dutch RoBERTa-based Language Model , 2020, FINDINGS.

[7]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[8]  Jason Baldridge,et al.  PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[9]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[10]  Tom Goldstein,et al.  FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.

[11]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[12]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[13]  R. Thomas McCoy,et al.  Syntactic Data Augmentation Increases Robustness to Inference Heuristics , 2020, ACL.

[14]  Marko Robnik-Sikonja,et al.  FinEst BERT and CroSloEngual BERT: less is more in multilingual models , 2020, TDS.

[15]  Stefan Schweter,et al.  German's Next Language Model , 2020, COLING.

[16]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[17]  Iryna Gurevych,et al.  Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures , 2020, ArXiv.

[18]  Ani Nenkova,et al.  Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models , 2020, ArXiv.

[19]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Yonatan Belinkov,et al.  End-to-End Bias Mitigation by Modelling Biases in Corpora , 2020, ACL.

[22]  Anders Søgaard Part-of-speech tagging with antagonistic adversaries , 2013, ACL.

[23]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[24]  Kalina Bontcheva,et al.  Generalisation in named entity recognition: A quantitative analysis , 2017, Comput. Speech Lang..

[25]  Philippe Langlais,et al.  Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus , 2018, LREC.

[26]  Linlin Liu,et al.  DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks , 2020, EMNLP.

[27]  Iryna Gurevych,et al.  Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance , 2020, ACL.

[28]  Luke Zettlemoyer,et al.  Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles , 2020, FINDINGS.

[29]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[30]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[31]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[32]  Tapio Salakoski,et al.  Multilingual is not enough: BERT for Finnish , 2019, ArXiv.

[33]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[35]  Steven Skiena,et al.  POLYGLOT-NER: Massive Multilingual Named Entity Recognition , 2014, SDM.

[36]  Dirk Hovy,et al.  Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview , 2019, ACL.

[37]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[38]  Tommaso Caselli,et al.  BERTje: A Dutch BERT Model , 2019, ArXiv.

[39]  Thong Nguyen,et al.  Adaptive Name Entity Recognition under Highly Unbalanced Data , 2020, ArXiv.

[40]  Sunita Sarawagi,et al.  What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name? , 2020, REPL4NLP.

[41]  Dan Roth,et al.  Robust Named Entity Recognition with Truecasing Pretraining , 2020, AAAI.

[42]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[43]  Haohan Wang,et al.  Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.

[44]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[45]  Iryna Gurevych,et al.  Towards Debiasing NLU Models from Unknown Biases , 2020, EMNLP.

[46]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[47]  Ronan Le Bras,et al.  Adversarial Filters of Dataset Biases , 2020, ICML.

[48]  Heike Adel,et al.  An Analysis of Simple Data Augmentation for Named Entity Recognition , 2020, COLING.

[49]  Philippe Langlais,et al.  Coreference in Wikipedia: Main Concept Resolution , 2016, CoNLL.

[50]  Gabriel Bernier-Colborne,et al.  HardEval: Focusing on Challenging Tokens to Assess Robustness of NER , 2020, LREC.

[51]  Ani Nenkova,et al.  Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve , 2020, ArXiv.

[52]  R. Shprintzen,et al.  What's in a name? , 1990, The Cleft palate journal.

[53]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[54]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[55]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[56]  Yongqiang Wang,et al.  An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[57]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[58]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[59]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[60]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[61]  Regina Barzilay,et al.  Towards Debiasing Fact Verification Models , 2019, EMNLP.

[62]  Benjamin Lecouteux,et al.  FlauBERT: Unsupervised Language Model Pre-training for French , 2020, LREC.

[63]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[64]  Yoav Goldberg,et al.  Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.

[65]  Timothy Baldwin,et al.  Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment , 2015, ALTA.

[66]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[67]  Timothy J. Hazen,et al.  Robust Natural Language Inference Models with Example Forgetting , 2019, ArXiv.

[68]  Yonatan Belinkov,et al.  Learning from others' mistakes: Avoiding dataset biases without modeling them , 2020, ICLR.

[69]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[70]  Yonatan Belinkov,et al.  On Adversarial Removal of Hypothesis-only Bias in Natural Language Inference , 2019, *SEMEVAL.

[71]  Christian Igel,et al.  Do End-to-End Speech Recognition Models Care About Context? , 2020, INTERSPEECH.

[72]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[73]  Tomaž Erjavec,et al.  Training corpus hr500k 1.0 , 2018 .

[74]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[75]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[76]  Rick Siow Mong Goh,et al.  Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition , 2019, ACL.

[77]  Anders Søgaard,et al.  DaNE: A Named Entity Resource for Danish , 2020, LREC.

[78]  Alan Ritter,et al.  Results of the WNUT16 Named Entity Recognition Shared Task , 2016, NUT@COLING.

[79]  Abbas Ghaddar,et al.  WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition , 2017, IJCNLP.

[80]  Veronika Laippala,et al.  A Broad-coverage Corpus for Finnish Named Entity Recognition , 2020, LREC.

[81]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[82]  Sebastian Padó,et al.  Masking Actor Information Leads to Fairer Political Claims Detection , 2020, ACL.

[83]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[84]  Thomas Demeester,et al.  Adversarial training for multi-context joint entity and relation extraction , 2018, EMNLP.

[85]  Noah D. Goodman,et al.  Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[86]  Xianpei Han,et al.  A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? , 2020, ArXiv.

[87]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[88]  Luke Zettlemoyer,et al.  Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.

[89]  Yong Cheng,et al.  Robust Neural Machine Translation with Doubly Adversarial Inputs , 2019, ACL.

[90]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[91]  Jimmy J. Lin,et al.  DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference , 2020, ACL.

[92]  Roald Eiselen,et al.  Government Domain Named Entity Recognition for South African Languages , 2016, LREC.

[93]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[94]  Juntao Yu,et al.  Named Entity Recognition as Dependency Parsing , 2020, ACL.

[95]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[96]  Jonas Kuhn,et al.  Who Sides with Whom? Towards Computational Construction of Discourse Networks for Political Debates , 2019, ACL.

[97]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.