Learning Syntactic Patterns for Automatic Hypernym Discovery

Semantic taxonomies such as WordNet provide a rich source of knowledge for natural language processing applications, but are expensive to build, maintain, and extend. Motivated by the problem of automatically constructing and extending such taxonomies, in this paper we present a new algorithm for automatically learning hypernym (is-a) relations from text. Our method generalizes earlier work that had relied on using small numbers of hand-crafted regular expression patterns to identify hypernym pairs. Using "dependency path" features extracted from parse trees, we introduce a general-purpose formalization and generalization of these patterns. Given a training set of text containing known hypernym pairs, our algorithm automatically extracts useful dependency paths and applies them to new corpora to identify novel pairs. On our evaluation task (determining whether two nouns in a news article participate in a hypernym relationship), our automatically extracted database of hypernyms attains both higher precision and higher recall than WordNet.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  Donna K. Harman,et al.  The DARPA TIPSTER project , 1992, SIGF.

[4]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[5]  Hinrich Schütze,et al.  Customizing a Lexicon to Better Suit a Computational Task , 1996 .

[6]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[7]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[8]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[9]  Brian Roark,et al.  Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction , 2000, COLING.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[12]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[13]  Massimiliano Ciaramita,et al.  Supersense Tagging of Unknown Nouns in WordNet , 2003, EMNLP.

[14]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[15]  Huihsin Tseng Semantic Classification of Chinese Unknown Words , 2003, ACL.

[16]  Dominic Widdows,et al.  Unsupervised methods for developing taxonomies by combining syntactic and statistical information , 2003, NAACL.

[17]  Thomas Hofmann,et al.  Hierarchical Semantic Classification: Word Sense Disambiguation with World Knowledge , 2003, IJCAI.

[18]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[19]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[20]  Patrick Pantel,et al.  Clustering by committee , 2003 .

[21]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[22]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[23]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.