Applying Dependency Relations to Definition Extraction

Definition Extraction (DE) is the task to automatically identify definitional knowledge in naturally-occurring text. This task has applications in ontology generation, glossary creation or question answering. Although the traditional approach to DE has been based on hand-crafted pattern-matching rules, recent methods incorporate learning algorithms in order to classify sentences as definitional or non-definitional. This paper presents a supervised approach to Definition Extraction in which only syntactic features derived from dependency relations are used. We model the problem as a classification task where each sentence has to be classified as being or not definitional. We compare our results with two well-known approaches: First, a supervised method based on Word-Class Lattices and second, an unsupervised approach based on mining recurrent patterns. Our competitive results suggest that syntactic information alone can contribute substantially to the development and improvement of DE systems.

[1]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[2]  Selja Seppälä,et al.  A Proposal for a Framework to Evaluate Feature Relevance for Terminographic Definitions , 2009 .

[3]  Peng Jiang,et al.  Automatic extraction of definitions , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[4]  Ingrid Meyer Extracting knowledge-rich contexts for terminography , 2001 .

[5]  Paola Velardi,et al.  An Annotated Dataset for Extracting Definitions and Hypernyms from the Web , 2010, LREC.

[6]  Eline Westerhout,et al.  Creating Glossaries Using Pattern-Based and Machine Learning Techniques , 2008, LREC.

[7]  Adam Przepiórkowski,et al.  Definition Extraction Using a Sequential Combination of Baseline Grammars and Machine Learning Classifiers , 2008, LREC.

[8]  David Yarowsky,et al.  Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , 2013, EMNLP 2013.

[9]  Smaranda Muresan,et al.  A Method for Automatically Building and Evaluating Dictionary Resources , 2002, LREC.

[10]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[11]  Pierre Zweigenbaum,et al.  Detecting Semantic Relations between Terms in Definitions , 2004 .

[12]  Tat-Seng Chua,et al.  Generic soft pattern models for definitional question answering , 2005, SIGIR '05.

[13]  Elena Paslaru Bontas Simperl,et al.  Towards a Cost Estimation Model for Ontology Engineering , 2005, Berliner XML Tage.

[14]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[15]  van Gerardus Noord,et al.  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) , 2010 .

[16]  Eline Westerhout,et al.  What can NLP techniques do for eLearning , 2008 .

[17]  Angelika Storrer,et al.  Automated detection and annotation of term definitions in German text corpora , 2006, LREC.

[18]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[19]  Gosse Bouma,et al.  Learning to Identify Definitions using Syntactic Features , 2006, Learning Structured Information@EACL.

[20]  Branimir Boguraev,et al.  Automatic Glossary Extraction: Beyond Terminology Identification , 2002, COLING.

[21]  Makoto Nagao,et al.  Extraction of Semantic Information from an Ordinary English Dictionary and its Evaluation , 1988, COLING.

[22]  Adam Przepiórkowski,et al.  Towards the Automatic Extraction of Definitions in Slavic , 2007, ACL 2007.

[23]  Ralph Grishman,et al.  An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition , 2003, ACL.

[24]  Eline Westerhout,et al.  Extraction of Dutch definitory contexts for eLearning purposes , 2007 .

[25]  António Branco,et al.  Coping with highly imbalanced datasets: A case study with definition extraction in a multilingual setting , 2013, Natural Language Engineering.

[26]  Joakim Nivre,et al.  Dependency Grammar and Dependency Parsing , 2005 .

[27]  Min-Yen Kan,et al.  Mining Scientific Terms and their Definitions: A Study of the ACL Anthology , 2013, EMNLP.

[28]  Gerardo Sierra,et al.  Towards the building of a corpus of definitional contexts , 2006 .

[29]  Manfred Pinkal,et al.  Automatic Extraction of Definitions from German Court Decisions , 2006 .

[30]  Kadri Hacioglu,et al.  Semantic Role Labeling Using Dependency Trees , 2004, COLING.

[31]  Luis Espinosa Anke Towards Definition Extraction Using Conditional Random Fields , 2013, RANLP.

[32]  Luís Sarmento,et al.  Corpógrafo V3 - From Terminological Aid to Semi-automatic Knowledge Engineering , 2006, LREC.

[33]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[34]  Ulrich Schäfer,et al.  Extracting glossary sentences from scholarly articles: A comparative evaluation of pattern bootstrapping and deep analysis , 2012, Discoveries@ACL.

[35]  Kentaro Torisawa,et al.  Pattern Mining Approach to Unsupervised Definition Extraction , 2012 .

[36]  Robert J. Gaizauskas,et al.  Mining On-line Sources for Definition Knowledge , 2004, FLAIRS.

[37]  Ludovic Tanguy,et al.  Repérage automatique de structures linguistiques en corpus : le cas des énoncés définitoires , 2000 .

[38]  Gordon J. Pace,et al.  Evolutionary Algorithms for Definition Extraction , 2009 .

[39]  Mark Stevenson,et al.  Comparing Information Extraction Pattern Models , 2006 .

[40]  Ruslan Mitkov,et al.  Unsupervised Relation Extraction Using Dependency Trees for Automatic Generation of Multiple-Choice Questions , 2011, Canadian Conference on AI.

[41]  Bonnie Webber,et al.  Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries , 2012 .