论文信息 - De-biasing Distantly Supervised Named Entity Recognition via Causal Intervention

De-biasing Distantly Supervised Named Entity Recognition via Causal Intervention

Distant supervision tackles the data bottleneck in NER by automatically generating training instances via dictionary matching. Unfortunately, the learning of DS-NER is severely dictionary-biased, which suffers from spurious correlations and therefore undermines the effectiveness and the robustness of the learned models. In this paper, we fundamentally explain the dictionary bias via a Structural Causal Model (SCM), categorize the bias into intra-dictionary and inter-dictionary biases, and identify their causes. Based on the SCM, we learn de-biased DS-NER via causal interventions. For intra-dictionary bias, we conduct backdoor adjustment to remove the spurious correlations introduced by the dictionary confounder. For inter-dictionary bias, we propose a causal invariance regularizer which will make DS-NER models more robust to the perturbation of dictionaries. Experiments on four datasets and three DS-NER models show that our method can significantly improve the performance of DS-NER.

[1] Charles Blundell,et al. Representation Learning via Invariant Causal Mechanisms , 2020, ICLR.

[2] Hanwang Zhang,et al. Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[3] Wesley De Neve,et al. Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations , 2015, NUT@IJCNLP.

[4] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5] Matthew S. Fritz,et al. Mediation analysis. , 2019, Annual review of psychology.

[6] Chao Zhang,et al. BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision , 2020, KDD.

[7] Lorenzo Richiardi,et al. Mediation analysis in epidemiology: methods, interpretation and bias. , 2013, International journal of epidemiology.

[8] Teng Ren,et al. Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[9] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[10] Xianpei Han,et al. A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? , 2020, ArXiv.

[11] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[12] Dan Roth,et al. Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[13] L. Keele. The Statistics of Causal Inference: A View from Political Methodology , 2015, Political Analysis.

[14] J. Pearl. Causal diagrams for empirical research , 1995 .

[15] Hongyu Lin,et al. Denoising Distantly Supervised Named Entity Recognition via a Hypergeometric Probabilistic Model , 2021, AAAI.

[16] Zhiyuan Liu,et al. Low-Resource Name Tagging Learned with Weakly Labeled Data , 2019, EMNLP.

[17] Luo Si,et al. De-biased Court’s View Generation with Causality , 2020, EMNLP.

[18] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[19] Mélanie Frappier,et al. The Book of Why: The New Science of Cause and Effect , 2018, Science.

[20] Xuanjing Huang,et al. Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning , 2019, ACL.

[21] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[22] Hanwang Zhang,et al. Two Causal Principles for Improving Visual Dialog , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Zhiyuan Liu,et al. Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[24] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[25] Joel Nothman,et al. Named Entity Recognition in Wikipedia , 2009, PWNLP@IJCNLP.

[26] Min Zhang,et al. Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning , 2018, COLING.

[27] Jason Weston,et al. Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[28] Illtyd Trethowan. Causality , 1938 .

[29] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[30] Yaojie Lu,et al. Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks , 2019, ACL.