NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[3]  Martha Palmer,et al.  Adding predicate argument structure to the Penn TreeBank , 2002 .

[4]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[5]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[6]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[7]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[8]  Beatrice Alex,et al.  Recognising Nested Named Entities in Biomedical Text , 2007, BioNLP@ACL.

[9]  James R. Curran,et al.  Adding Noun Phrase Structure to the Penn Treebank , 2007, ACL.

[10]  Kate Byrne,et al.  Nested Named Entity Recognition in Historical Archive Text , 2007, International Conference on Semantic Computing (ICSC 2007).

[11]  Christopher D. Manning,et al.  Nested Named Entity Recognition , 2009, EMNLP.

[12]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[13]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[14]  Nicola Ringland Structured Named Entities , 2015 .

[15]  Dan Roth,et al.  Joint Mention Extraction and Classification with Mention Hypergraphs , 2015, EMNLP.

[16]  Ben Hachey,et al.  Overview of TAC-KBP2014 Entity Discovery and Linking Tasks , 2015 .

[17]  Yoav Goldberg,et al.  Coordination Annotation Extension in the Penn Tree Bank , 2016, ACL.

[18]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[19]  Wei Lu,et al.  Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators , 2017, EMNLP.

[20]  Hui Jiang,et al.  A Local Detection Approach for Named Entity Recognition and Mention Detection , 2017, ACL.

[21]  Claire Cardie,et al.  Nested Named Entity Recognition Revisited , 2018, NAACL.

[22]  Sophia Ananiadou,et al.  A Neural Layered Model for Nested Named Entity Recognition , 2018, NAACL.

[23]  Hongxia Jin,et al.  A Neural Transition-based Model for Nested Mention Recognition , 2018, EMNLP.

[24]  Xiang Dai,et al.  Recognizing Complex Entity Mentions: A Review and Future Directions , 2018, ACL.

[25]  Makoto Miwa,et al.  Deep Exhaustive Model for Nested Named Entity Recognition , 2018, EMNLP.

[26]  Wei Lu,et al.  Neural Segmental Hypergraphs for Overlapping Mention Recognition , 2018, EMNLP.