Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

Relation extraction (RE) aims to extract relations from sentences and documents. Existing relation extraction models typically rely on supervised machine learning. However, recent studies showed that many RE datasets are incompletely annotated. This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'. Models trained with such data inevitably make similar mistakes during the inference stage. Self-training has been proven effective in alleviating the false negative problem. However, traditional self-training is vulnerable to confirmation bias and exhibits poor performance in minority classes. To overcome this limitation, we proposed a novel class-adaptive re-sampling self-training framework. Specifically, we re-sampled the pseudo-labels for each class by precision and recall scores. Our re-sampling strategy favored the pseudo-labels of classes with high precision and low recall, which improved the overall recall without significantly compromising precision. We conducted experiments on document-level and biomedical relation extraction datasets, and the results showed that our proposed self-training framework consistently outperforms existing competitive methods on the Re-DocRED and ChemDisgene datasets when the training data are incompletely annotated. Our code is released at https://github.com/DAMO-NLP-SG/CAST.

[1]  E. Cambria,et al.  Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning , 2023, ArXiv.

[2]  Wei Lu,et al.  Better Sampling of Negatives for Distantly Supervised Named Entity Recognition , 2023, ArXiv.

[3]  Lidong Bing,et al.  Towards Integration of Discriminability and Robustness for Document-Level Relation Extraction , 2023, EACL.

[4]  Sharifah Mahani Aljunied,et al.  A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach , 2022, EMNLP.

[5]  T. Zhang,et al.  A Unified Positive-Unlabeled Learning Framework for Document-Level Relation Extraction with Different Levels of Labeling , 2022, EMNLP.

[6]  Dongyan Zhao,et al.  Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED , 2022, ACL.

[7]  A. McCallum,et al.  A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes , 2022, LREC.

[8]  H. Ng,et al.  Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation , 2022, FINDINGS.

[9]  Yew Ken Chia,et al.  RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction , 2022, FINDINGS.

[10]  Lidong Bing,et al.  Aspect Sentiment Quad Prediction as Paraphrase Generation , 2021, EMNLP.

[11]  Chuanqi Tan,et al.  Document-level Relation Extraction as Semantic Segmentation , 2021, IJCAI.

[12]  Alan Yuille,et al.  Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning , 2021, ArXiv.

[13]  Emmanouil Antonios Platanios,et al.  Re-TACRED: Addressing Shortcomings of the TACRED Dataset , 2021, AAAI.

[14]  Wei Lu,et al.  Better Feature Integration for Named Entity Recognition , 2021, NAACL.

[15]  A. Yuille,et al.  CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Lemao Liu,et al.  Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition , 2020, ICLR.

[17]  Wei-Yun Ma,et al.  H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction , 2020, FINDINGS.

[18]  Zhiyuan Liu,et al.  Denoising Relation Extraction from Document-level Distant Supervision , 2020, EMNLP.

[19]  Jing Huang,et al.  Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling , 2020, AAAI.

[20]  Philip S. Yu,et al.  Semi-supervised Relation Extraction via Incremental Meta Self-Training , 2020, EMNLP.

[21]  Hiroyuki Shindo,et al.  LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention , 2020, EMNLP.

[22]  Shuang Zeng,et al.  Double Graph Based Reasoning for Document-level Relation Extraction , 2020, EMNLP.

[23]  Hai Zhao,et al.  Syntax Role for Neural Semantic Role Labeling , 2020, CL.

[24]  Aleksandra Gabryszak,et al.  TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task , 2020, ACL.

[25]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[26]  Piji Li,et al.  Tackling Long-Tailed Relations and Uncommon Entities in Knowledge Graph Completion , 2019, EMNLP.

[27]  Sophia Ananiadou,et al.  Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs , 2019, EMNLP.

[28]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[29]  Maosong Sun,et al.  DocRED: A Large-Scale Document-Level Relation Extraction Dataset , 2019, ACL.

[30]  Xuanjing Huang,et al.  Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning , 2019, ACL.

[31]  Wei Lu,et al.  Better Modeling of Incomplete Annotations for Named Entity Recognition , 2019, NAACL.

[32]  Xin Li,et al.  Aspect Term Extraction with History Attention and Selective Transformation , 2018, IJCAI.

[33]  Li Zhao,et al.  Reinforcement Learning for Relation Classification From Noisy Data , 2018, AAAI.

[34]  Andrew McCallum,et al.  Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[35]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[36]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[37]  Ralph Grishman,et al.  Semi-supervised Relation Extraction with Large-scale Word Clustering , 2011, ACL.

[38]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[39]  Dragomir R. Radev,et al.  Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing , 2007, EMNLP.

[40]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[41]  H. Ng,et al.  Revisiting DocRED - Addressing the Overlooked False Negative Problem in Relation Extraction , 2022, arXiv.org.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.