Evaluation of Automatic Hypernym Extraction from Technical Corpora in English and Dutch

In this research, we evaluate different approaches for the automatic extraction of hypernym relations from English and Dutch technical text. The detected hypernym relations should enable us to semantically structure automatically obtained term lists from domain- and user-specific data. We investigated three different hypernymy extraction approaches for Dutch and English: a lexico-syntactic pattern-based approach, a distributional model and a morpho-syntactic method. To test the performance of the different approaches on domain-specific data, we collected and manually annotated English and Dutch data from two technical domains, viz. the dredging and financial domain. The experimental results show that especially the morpho-syntactic approach obtains good results for automatic hypernym extraction from technical and domain-specific texts.

[1]  Michael P. Oakes Using Hearst's Rules for the Automatic Acquisition of Hyponyms for Mining a Pharmaceutical Corpus , 2005, RANLP Text Mining Workshop.

[2]  Katja Hofmann,et al.  Extraction of Hypernymy Information from Text∗ , 2011, Interactive Multi-modal Question-Answering.

[3]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[4]  Véronique Hoste,et al.  A Combined Pattern-based and Distributional Approach for Automatic Hypernym Detection in Dutch. , 2013, RANLP.

[5]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[7]  Ulrich Schäfer,et al.  Extracting glossary sentences from scholarly articles: A comparative evaluation of pattern bootstrapping and deep analysis , 2012, Discoveries@ACL.

[8]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[9]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[10]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[11]  Verginica Barbu Mititelu Hyponymy Patterns , 2008, TSD.

[12]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[13]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[14]  Christian Biemann,et al.  Ontology Learning from Text: A Survey of Methods , 2005, LDV Forum.

[15]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[16]  Ted Pedersen,et al.  SenseClusters - Finding Clusters that Represent Word Senses , 2004, AAAI.

[17]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[18]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[19]  Els Lefever,et al.  LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit , 2013, CLIN 2013.

[20]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[21]  Oren Etzioni,et al.  What Is This, Anyway: Automatic Hypernym Discovery , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[22]  Katja Hofmann,et al.  Automatic Extraction of Dutch Hypernym-Hyponym Pairs , 2007, CLIN 2007.

[23]  Karen Sparck Jones Compound noun interpretation problems , 1986 .

[24]  Chu-Ren Huang,et al.  Automatic acquisition of lexico-semantic knowledge for question answering , 2010 .

[25]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[26]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[27]  Nelleke Oostdijk,et al.  The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch , 2013, Essential Speech and Language Technology for Dutch.

[28]  Pierre Zweigenbaum,et al.  Detecting Semantic Relations between Terms in Definitions , 2004 .

[29]  Piek T. J. M. Vossen,et al.  Bootstrapping Language Neutral Term Extraction , 2010, LREC.