Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

Multi-category bootstrapping algorithms were developed to reduce semantic drift. By extracting multiple semantic lexicons simultaneously, a category's search space may be restricted. The best results have been achieved through reliance on manually crafted negative categories. Unfortunately, identifying these categories is non-trivial, and their use shifts the unsupervised bootstrapping paradigm towards a supervised framework. We present NEG-FINDER, the first approach for discovering negative categories automatically. NEG-FINDER exploits unsupervised term clustering to generate multiple negative categories during bootstrapping. Our algorithm effectively removes the necessity of manual intervention and formulation of negative categories, with performance closely approaching that obtained using negative categories defined by a domain expert.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[3]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[4]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[5]  Ralph Grishman,et al.  Unsupervised Learning of Generalized Names , 2002, COLING.

[6]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[7]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[8]  Ralph Grishman,et al.  Bootstrapped Learning of Semantic Classes from Positive and Negative Examples , 2003 .

[9]  Patrick Pantel,et al.  Clustering by committee , 2003 .

[10]  Hong Yu,et al.  Extracting synonymous gene and protein terms from biological literature , 2003, ISMB.

[11]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[12]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[13]  Claire Grover,et al.  Tools to Address the Interdependence between Tokenisation and Standoff Annotation , 2006, NLPXML@EACL.

[14]  Jeffrey P. Bigham,et al.  Names and Similarities on the Web: Fact Extraction in the Fast Lane , 2006, ACL.

[15]  J. Curran,et al.  Minimising semantic drift with Mutual Exclusion Bootstrapping , 2007 .

[16]  James R. Curran,et al.  Weighted Mutual Exclusion Bootstrapping for Domain Independent Lexicon and Template Acquisition , 2008, ALTA.

[17]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[18]  James R. Curran,et al.  Reducing Semantic Drift with Bagging and Distributional Similarity , 2009, ACL.

[19]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.