A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts

This paper describes a bootstrapping algorithm called Basilisk that learns high-quality semantic lexicons for multiple categories. Basilisk begins with an unannotated corpus and seed words for each semantic category, which are then bootstrapped to learn new words for each category. Basilisk hypothesizes the semantic class of a word based on collective information over a large body of extraction pattern contexts. We evaluate Basilisk on six semantic categories. The semantic lexicons produced by Basilisk have higher precision than those produced by previous techniques, with several categories showing substantial improvement.

[1]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[2]  Ellen Riloff,et al.  An Empirical Approach to Conceptual Case Frame Acquisition , 1998, VLC@COLING/ACL.

[3]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[4]  Scott Bennett,et al.  Applying machine learning to anaphora resolution , 1995, Learning for Natural Language Processing.

[5]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[6]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[9]  Sanda M. Harabagiu,et al.  LASSO: A Tool for Surfing the Answer Net , 1999, TREC.

[10]  Eric Brill,et al.  A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation , 1994, COLING.

[11]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[12]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[15]  John Hale,et al.  A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[16]  Lynette Hirschman,et al.  Deep Read: A Reading Comprehension System , 1999, ACL.

[17]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[18]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[19]  Brian Roark,et al.  Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction , 2000, COLING.