论文信息 - Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

Multi-category bootstrapping algorithms were developed to reduce semantic drift. By extracting multiple semantic lexicons simultaneously, a category's search space may be restricted. The best results have been achieved through reliance on manually crafted negative categories. Unfortunately, identifying these categories is non-trivial, and their use shifts the unsupervised bootstrapping paradigm towards a supervised framework. We present NEG-FINDER, the first approach for discovering negative categories automatically. NEG-FINDER exploits unsupervised term clustering to generate multiple negative categories during bootstrapping. Our algorithm effectively removes the necessity of manual intervention and formulation of negative categories, with performance closely approaching that obtained using negative categories defined by a domain expert.

Tara McIntosh | T. Mcintosh

[1] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2] S.J.J. Smith,et al. Empirical Methods for Artificial Intelligence , 1995 .

[3] Ellen Riloff,et al. A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[4] Ellen Riloff,et al. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[5] Ralph Grishman,et al. Unsupervised Learning of Generalized Names , 2002, COLING.

[6] Ellen Riloff,et al. A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[7] Eduard H. Hovy,et al. Learning surface text patterns for a Question Answering System , 2002, ACL.

[8] Ralph Grishman,et al. Bootstrapped Learning of Semantic Classes from Positive and Negative Examples , 2003 .

[9] Patrick Pantel,et al. Clustering by committee , 2003 .

[10] Hong Yu,et al. Extracting synonymous gene and protein terms from biological literature , 2003, ISMB.

[11] Marti A. Hearst,et al. TREC 2007 Genomics Track Overview , 2007, TREC.