De-biasing Distantly Supervised Named Entity Recognition via Causal Intervention

Distant supervision tackles the data bottleneck in NER by automatically generating training instances via dictionary matching. Unfortunately, the learning of DS-NER is severely dictionary-biased, which suffers from spurious correlations and therefore undermines the effectiveness and the robustness of the learned models. In this paper, we fundamentally explain the dictionary bias via a Structural Causal Model (SCM), categorize the bias into intra-dictionary and inter-dictionary biases, and identify their causes. Based on the SCM, we learn de-biased DS-NER via causal interventions. For intra-dictionary bias, we conduct backdoor adjustment to remove the spurious correlations introduced by the dictionary confounder. For inter-dictionary bias, we propose a causal invariance regularizer which will make DS-NER models more robust to the perturbation of dictionaries. Experiments on four datasets and three DS-NER models show that our method can significantly improve the performance of DS-NER.

[1]  Charles Blundell,et al.  Representation Learning via Invariant Causal Mechanisms , 2020, ICLR.

[2]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[3]  Wesley De Neve,et al.  Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations , 2015, NUT@IJCNLP.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Matthew S. Fritz,et al.  Mediation analysis. , 2019, Annual review of psychology.

[6]  Chao Zhang,et al.  BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision , 2020, KDD.

[7]  Lorenzo Richiardi,et al.  Mediation analysis in epidemiology: methods, interpretation and bias. , 2013, International journal of epidemiology.

[8]  Teng Ren,et al.  Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[9]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[10]  Xianpei Han,et al.  A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? , 2020, ArXiv.

[11]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[12]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[13]  L. Keele The Statistics of Causal Inference: A View from Political Methodology , 2015, Political Analysis.

[14]  J. Pearl Causal diagrams for empirical research , 1995 .

[15]  Hongyu Lin,et al.  Denoising Distantly Supervised Named Entity Recognition via a Hypergeometric Probabilistic Model , 2021, AAAI.

[16]  Zhiyuan Liu,et al.  Low-Resource Name Tagging Learned with Weakly Labeled Data , 2019, EMNLP.

[17]  Luo Si,et al.  De-biased Court’s View Generation with Causality , 2020, EMNLP.

[18]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[19]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[20]  Xuanjing Huang,et al.  Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning , 2019, ACL.

[21]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[22]  Hanwang Zhang,et al.  Two Causal Principles for Improving Visual Dialog , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[24]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[25]  Joel Nothman,et al.  Named Entity Recognition in Wikipedia , 2009, PWNLP@IJCNLP.

[26]  Min Zhang,et al.  Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning , 2018, COLING.

[27]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[28]  Illtyd Trethowan Causality , 1938 .

[29]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[30]  Yaojie Lu,et al.  Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks , 2019, ACL.