Uncovering Main Causalities for Long-tailed Information Extraction

Information Extraction (IE) aims to extract structural information from unstructured texts. In practice, long-tailed distributions caused by the selection bias of a dataset, may lead to incorrect correlations, also known as spurious correlations, between entities and labels in the conventional likelihood models. This motivates us to propose counterfactual IE (CFIE), a novel framework that aims to uncover the main causalities behind data in the view of causal inference. Specifically, 1) we first introduce a unified structural causal model (SCM) for various IE tasks, describing the relationships among variables; 2) with our SCM, we then generate counterfactuals based on an explicit language structure to better calculate the direct causal effect during the inference stage; 3) we further propose a novel debiasing approach to yield more robust predictions. Experiments on three IE tasks across five public datasets show the effectiveness of our CFIE model in mitigating the spurious correlation issues.

[1]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[2]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[3]  Wei Lu,et al.  Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction , 2021, ArXiv.

[4]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[5]  Jian Tang,et al.  Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs , 2020, ICML.

[6]  Wei Lu,et al.  Dependency-Guided LSTM-CRF for Named Entity Recognition , 2019, EMNLP.

[7]  Huajun Chen,et al.  Contrastive Triple Extraction with Generative Transformer , 2020, ArXiv.

[8]  Wei Lu,et al.  Reasoning with Latent Structure Refinement for Document-Level Relation Extraction , 2020, ACL.

[9]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[10]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[11]  Xi Chen,et al.  Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks , 2019, NAACL.

[12]  Lidong Bing,et al.  Better Feature Integration for Named Entity Recognition , 2021, NAACL.

[13]  Hanwang Zhang,et al.  Deconfounded Image Captioning: A Causal Retrospect , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[15]  Hanwang Zhang,et al.  Interventional Few-Shot Learning , 2020, NeurIPS.

[16]  Jun Liu,et al.  SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jinfeng Yi,et al.  Model-Agnostic Counterfactual Reasoning for Eliminating Popularity Bias in Recommender System , 2020, KDD.

[18]  Wei Fan,et al.  Cooperative Denoising for Distantly Supervised Relation Extraction , 2018, COLING.

[19]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[20]  Zhiwu Lu,et al.  Counterfactual VQA: A Cause-Effect Look at Language Bias , 2020, Computer Vision and Pattern Recognition.

[21]  Zhao Wang,et al.  Identifying spurious correlations for robust text classification , 2020, FINDINGS.

[22]  Dacheng Tao,et al.  Label-Noise Robust Domain Adaptation , 2020, ICML.

[23]  Zhiyuan Liu,et al.  Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention , 2018, EMNLP.

[24]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[27]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[28]  Tong Zhang,et al.  Stable Learning via Differentiated Variable Decorrelation , 2020, KDD.

[29]  Jianqiang Huang,et al.  Unbiased Scene Graph Generation From Biased Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jianfeng Dong,et al.  Context-aware Biaffine Localizing Network for Temporal Sentence Grounding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lifu Huang,et al.  Zero-Shot Transfer Learning for Event Extraction , 2017, ACL.

[32]  Guohui Ling,et al.  Causal Intervention for Leveraging Popularity Bias in Recommendation , 2021, SIGIR.

[33]  Xiang Ren,et al.  Learning Dual Retrieval Module for Semi-supervised Relation Extraction , 2019, WWW.

[34]  Donald B. Rubin,et al.  Essential concepts of causal inference: a remarkable history and an intriguing future , 2019, Biostatistics & Epidemiology.

[35]  Yue Zhang,et al.  N-ary Relation Extraction using Graph-State LSTM , 2018, EMNLP.

[36]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[37]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[38]  Xiangnan He,et al.  Clicks can be Cheating: Counterfactual Recommendation for Mitigating Clickbait Issue , 2020, SIGIR.

[39]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[40]  Maosong Sun,et al.  Learning from Context or Names? An Empirical Study on Neural Relation Extraction , 2020, EMNLP.

[41]  Tat-Seng Chua,et al.  Interventional Video Relation Detection , 2021, ACM Multimedia.

[42]  Luo Si,et al.  De-biased Court’s View Generation with Causality , 2020, EMNLP.

[43]  Ralph Grishman,et al.  Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[44]  Yifan Yang,et al.  PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction , 2021, ACL.

[45]  Xiangnan He,et al.  Should Graph Convolution Trust Neighbors? A Simple Causal Inference Method , 2020, SIGIR.

[46]  Huajun Chen,et al.  Document-level Relation Extraction as Semantic Segmentation , 2021, IJCAI.

[47]  Xiangnan He,et al.  Empowering Language Understanding with Counterfactual Reasoning , 2021, FINDINGS.

[48]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[49]  Liangli Zhen,et al.  Video Corpus Moment Retrieval with Contrastive Learning , 2021, SIGIR.

[50]  Fei Wu,et al.  Recurrent Attention Network with Reinforced Generator for Visual Dialog , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[51]  Zhao Wang,et al.  Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals , 2020, AAAI.

[52]  Illtyd Trethowan Causality , 1938 .

[53]  Meng Wang,et al.  Deconfounded Video Moment Retrieval with Causal Intervention , 2021, SIGIR.

[54]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[55]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[56]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[57]  Percy Liang,et al.  Robustness to Spurious Correlations via Human Annotations , 2020, ICML.

[58]  Matthias Niessner,et al.  ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language , 2020, ECCV.

[59]  Liangli Zhen,et al.  Natural Language Video Localization: A Revisit in Span-Based Question Answering Framework , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Anton van den Hengel,et al.  Counterfactual Vision and Language Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Ji-Rong Wen,et al.  Counterfactual Data-Augmented Sequential Recommendation , 2021, SIGIR.

[64]  Jinhui Tang,et al.  Causal Intervention for Weakly-Supervised Semantic Segmentation , 2020, NeurIPS.

[65]  Yunqi Li,et al.  Counterfactual Explainable Recommendation , 2021, CIKM.

[66]  Uri Shalit,et al.  Identifying Causal Effect Inference Failure with Uncertainty-Aware Models , 2020, NeurIPS.

[67]  Huajun Chen,et al.  OpenUE: An Open Toolkit of Universal Extraction from Text , 2020, EMNLP.

[68]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[69]  Jiashi Feng,et al.  The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation , 2020, ECCV.

[70]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[71]  Yongdong Zhang,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS SPECIAL ISSUE ON DEEP NEURAL NETWORKS FOR GRAPHS 1 Causal Incremental Graph Convolution for Recommender System Retraining , 2021 .

[72]  Rui Qiao,et al.  Interventional Video Grounding with Dual Contrastive Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[74]  Hoifung Poon,et al.  Distant Supervision for Relation Extraction beyond the Sentence Boundary , 2016, EACL.

[75]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[76]  Wei Lu,et al.  Learning Latent Forests for Medical Relation Extraction , 2020, IJCAI.

[77]  Angel X. Chang,et al.  Scan2Cap: Context-aware Dense Captioning in RGB-D Scans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Xuefeng Bai,et al.  Semantic Representation for Dialogue Modeling , 2021, ACL.

[79]  Jie Zhou,et al.  MAVEN: A Massive General Domain Event Detection Dataset , 2020, EMNLP.