DEFT: A corpus for definition extraction in free- and semi-structured text

Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to well-defined, structured, and narrow conditions. In reality, natural language is messy, and messy data requires both complex solutions and data that reflects that reality. In this paper, we present a robust English corpus and annotation schema that allows us to explore the less straightforward examples of term-definition structures in free and semi-structured text.

[1]  Tat-Seng Chua,et al.  Generic soft pattern models for definitional question answering , 2005, SIGIR '05.

[2]  Tat-Seng Chua,et al.  Unsupervised learning of soft patterns for generating definitions from online news , 2004, WWW '04.

[3]  Tat-Seng Chua,et al.  Soft pattern matching models for definitional question answering , 2007, TOIS.

[4]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[5]  Rebecca J. Passonneau,et al.  Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation , 2006, LREC.

[6]  António Branco,et al.  Language Independent System for Definition Extraction: First Results Using Learning Algorithms , 2009 .

[7]  Steven Schockaert,et al.  Syntactically Aware Neural Architectures for Definition Extraction , 2018, NAACL.

[8]  Michael Curtotti,et al.  Corpus Based Classification of Text in Australian Contracts , 2010, ALTA.

[9]  Min-Yen Kan,et al.  Mining Scientific Terms and their Definitions: A Study of the ACL Anthology , 2013, EMNLP.

[10]  Adam Przepiórkowski,et al.  Towards the Automatic Extraction of Definitions in Slavic , 2007, ACL 2007.

[11]  Dan Roth,et al.  Term Definitions Help Hypernymy Detection , 2018, *SEMEVAL.

[12]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[13]  Smaranda Muresan,et al.  Evaluation of the DEFINDER system for fully automatic glossary construction , 2001, AMIA.

[14]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[15]  Rinke Hoekstra,et al.  Automatic Extraction of Legal Concepts and Definitions , 2012, JURIX.

[16]  Dominik Schlechtweg,et al.  Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection , 2016, EACL.

[17]  Angelika Storrer,et al.  Automated detection and annotation of term definitions in German text corpora , 2006, LREC.

[18]  Gosse Bouma,et al.  Learning to Identify Definitions using Syntactic Features , 2006, Learning Structured Information@EACL.

[19]  António Branco,et al.  Automatic Extraction of Definitions in Portuguese: A Rule-Based Approach , 2007, EPIA Workshops.

[20]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[21]  Oren Etzioni,et al.  What Is This, Anyway: Automatic Hypernym Discovery , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[22]  James Pustejovsky,et al.  Natural Language Annotation for Machine Learning , 2012 .

[23]  Peng Jiang,et al.  Automatic extraction of definitions , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[24]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[25]  Adam Przepiórkowski,et al.  Definition Extraction Using a Sequential Combination of Baseline Grammars and Machine Learning Classifiers , 2008, LREC.

[26]  Michael Curtotti,et al.  Software tools for the visualization of definition networks in legal contracts , 2013, ICAIL.

[27]  Paola Velardi,et al.  An Annotated Dataset for Extracting Definitions and Hypernyms from the Web , 2010, LREC.

[28]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.