NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.

[1]  Benjamin Van Durme,et al.  A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards , 2014, EVENTS@ACL.

[2]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[3]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[4]  Aleksandra Gabryszak,et al.  TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task , 2020, ACL.

[5]  Leonhard Hennig,et al.  Improving Relation Extraction by Pre-trained Language Representations , 2019, AKBC.

[6]  Zhe Zhao,et al.  K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[7]  Lingfei Wu,et al.  Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward , 2020, ACL.

[8]  Antske Fokkens,et al.  Building event-centric knowledge graphs from news , 2016, J. Web Semant..

[9]  Maosong Sun,et al.  OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction , 2019, EMNLP.

[10]  Josef Steinberger,et al.  The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages , 2019 .

[11]  Juntao Yu,et al.  Named Entity Recognition as Dependency Parsing , 2020, ACL.

[12]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[13]  Christian Biemann,et al.  NoSta-D Named Entity Annotation for German: Guidelines and Dataset , 2014, LREC.

[14]  Natalia Loukachevitch,et al.  Two-stage approach in Russian named entity recognition , 2016, 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT).

[15]  Tatiana Batura,et al.  RUREBUS-2020 SHARED TASK: RUSSIAN RELATION EXTRACTION FOR BUSINESS , 2020 .

[16]  Seth Kulick,et al.  From Light to Rich ERE: Annotation of Entities, Relations, and Events , 2015, EVENTS@HLP-NAACL.

[17]  Yukari Yamakawa,et al.  Event Nugget Annotation: Processes and Issues , 2015, EVENTS@HLP-NAACL.

[18]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[19]  Barbara Plank,et al.  DaN+: Danish Nested Named Entities and Lexical Normalization , 2020, COLING.

[20]  Paul McNamee,et al.  An Evaluation of Technologies for Knowledge Base Population , 2010, LREC.

[21]  RELATION EXTRACTION DATASET FOR THE RUSSIAN , 2020 .

[22]  Bo Cheng,et al.  Open Domain Question Answering based on Text Enhanced Knowledge Graph with Hyperedge Infusion , 2020, FINDINGS.

[23]  Lidan Shou,et al.  Pyramid: A Layered Model for Nested Named Entity Recognition , 2020, ACL.

[24]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[25]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[26]  Svetlana Alexeeva,et al.  FactRuEval 2016: Evaluation of Named Entity Recognition and Fact Extraction Systems for Russian , 2016 .

[27]  James R. Curran,et al.  NNE: A Dataset for Nested Named Entity Recognition in English Newswire , 2019, ACL.

[28]  Miikka Silfverberg,et al.  A Finnish news corpus for named entity recognition , 2019, Language Resources and Evaluation.

[29]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[30]  Maosong Sun,et al.  DocRED: A Large-Scale Document-Level Relation Extraction Dataset , 2019, ACL.

[31]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[32]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[33]  Ann Bies,et al.  A Comparison of Event Representations in DEFT , 2016, EVENTS@HLT-NAACL.