NEREL: A Russian Dataset with Nested Named Entities and Relations

In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.

[1]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[2]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[3]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[4]  Seth Kulick,et al.  From Light to Rich ERE: Annotation of Entities, Relations, and Events , 2015, EVENTS@HLP-NAACL.

[5]  Yukari Yamakawa,et al.  Event Nugget Annotation: Processes and Issues , 2015, EVENTS@HLP-NAACL.

[6]  Ann Bies,et al.  A Comparison of Event Representations in DEFT , 2016, EVENTS@HLT-NAACL.

[7]  Zhe Zhao,et al.  K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[8]  Josef Steinberger,et al.  The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic Languages , 2019 .

[9]  Svetlana Alexeeva,et al.  FactRuEval 2016: Evaluation of Named Entity Recognition and Fact Extraction Systems for Russian , 2016 .

[10]  Maosong Sun,et al.  OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction , 2019, EMNLP.

[11]  Lidan Shou,et al.  Pyramid: A Layered Model for Nested Named Entity Recognition , 2020, ACL.

[12]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[13]  Christian Biemann,et al.  NoSta-D Named Entity Annotation for German: Guidelines and Dataset , 2014, LREC.

[14]  Maosong Sun,et al.  DocRED: A Large-Scale Document-Level Relation Extraction Dataset , 2019, ACL.

[15]  Barbara Plank,et al.  DaN+: Danish Nested Named Entities and Lexical Normalization , 2020, COLING.

[16]  Paul McNamee,et al.  An Evaluation of Technologies for Knowledge Base Population , 2010, LREC.

[17]  James R. Curran,et al.  NNE: A Dataset for Nested Named Entity Recognition in English Newswire , 2019, ACL.

[18]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[19]  Aleksandra Gabryszak,et al.  TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task , 2020, ACL.

[20]  Antske Fokkens,et al.  Building event-centric knowledge graphs from news , 2016, J. Web Semant..

[21]  RELATION EXTRACTION DATASET FOR THE RUSSIAN , 2020 .

[22]  Bo Cheng,et al.  Open Domain Question Answering based on Text Enhanced Knowledge Graph with Hyperedge Infusion , 2020, FINDINGS.

[23]  Miikka Silfverberg,et al.  A Finnish news corpus for named entity recognition , 2019, Language Resources and Evaluation.

[24]  Jiwei Li,et al.  A Unified MRC Framework for Named Entity Recognition , 2019, ACL.

[25]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[26]  Juntao Yu,et al.  Named Entity Recognition as Dependency Parsing , 2020, ACL.

[27]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[28]  Lingfei Wu,et al.  Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward , 2020, ACL.

[29]  Tatiana Batura,et al.  RUREBUS-2020 SHARED TASK: RUSSIAN RELATION EXTRACTION FOR BUSINESS , 2020 .

[30]  H. Johnson,et al.  A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..

[31]  Benjamin Van Durme,et al.  A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards , 2014, EVENTS@ACL.

[32]  Leonhard Hennig,et al.  Improving Relation Extraction by Pre-trained Language Representations , 2019, AKBC.

[33]  Natalia Loukachevitch,et al.  Two-stage approach in Russian named entity recognition , 2016, 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT).

[34]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.