Revisiting DocRED - Addressing the Overlooked False Negative Problem in Relation Extraction

The DocRED dataset is one of the most popular and widely used benchmarks for document-level relation extraction (RE). It adopts a recommend-revise annotation scheme so as to have a large-scale annotated dataset. However, we find that the annotation of DocRED is incomplete, i.e., the false negative samples are prevalent. We analyze the causes and effects of the overwhelming false negative problem in the DocRED dataset. To address the shortcom-ing, we re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED. We name our revised DocRED dataset Re-DocRED. We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points. Moreover, we propose different metrics to comprehensively evaluate the document-level RE task. 1

[1]  Dongyan Zhao,et al.  Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED , 2022, ACL.

[2]  H. Ng,et al.  Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation , 2022, FINDINGS.

[3]  Yew Ken Chia,et al.  RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction , 2022, FINDINGS.

[4]  Ronan Le Bras,et al.  Symbolic Knowledge Distillation: from General Language Models to Commonsense Models , 2021, NAACL.

[5]  Wei Hu,et al.  Knowing False Negatives: An Adversarial Training Method for Distantly Supervised Relation Extraction , 2021, EMNLP.

[6]  Chuanqi Tan,et al.  Document-level Relation Extraction as Semantic Segmentation , 2021, IJCAI.

[7]  Zhiyuan Liu,et al.  Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction , 2021, FINDINGS.

[8]  Emmanouil Antonios Platanios,et al.  Re-TACRED: Addressing Shortcomings of the TACRED Dataset , 2021, AAAI.

[9]  Timo Schick,et al.  Generating Datasets with Pretrained Language Models , 2021, EMNLP.

[10]  Adrian Ulges,et al.  An End-to-end Model for Entity-level Relation Extraction using Multi-instance Learning , 2021, EACL.

[11]  Wenxuan Zhou,et al.  An Improved Baseline for Sentence-level Relation Extraction , 2021, AACL.

[12]  Wei-Yun Ma,et al.  H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction , 2020, FINDINGS.

[13]  Jing Huang,et al.  Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling , 2020, AAAI.

[14]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[15]  Ronan Le Bras,et al.  G-DAug: Generative Data Augmentation for Commonsense Reasoning , 2020, FINDINGS.

[16]  Zhiyuan Liu,et al.  More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction , 2020, AACL.

[17]  Ateret Anaby-Tavor,et al.  Do Not Have Enough Data? Deep Learning to the Rescue! , 2020, AAAI.

[18]  Aleksandra Gabryszak,et al.  TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task , 2020, ACL.

[19]  Eunah Cho,et al.  Data Augmentation using Pre-trained Transformer Models , 2020, LIFELONGNLP.

[20]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[21]  Maosong Sun,et al.  FewRel 2.0: Towards More Challenging Few-Shot Relation Classification , 2019, EMNLP.

[22]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[23]  Maosong Sun,et al.  DocRED: A Large-Scale Document-Level Relation Extraction Dataset , 2019, ACL.

[24]  Zhiyuan Liu,et al.  FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation , 2018, EMNLP.

[25]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[26]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[27]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[28]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[29]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[30]  Juntao Liu,et al.  HacRED: A Large-Scale Relation Extraction Dataset Toward Hard Cases in Practical Applications , 2021, FINDINGS.

[31]  Chunyan Miao,et al.  MELM: Data Augmentation with Masked Entity Language Modeling for Cross-lingual NER , 2021, ArXiv.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[34]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .