Ensemble Methods for Automatic Thesaurus Extraction

Ensemble methods are state of the art for many NLP tasks. Recent work by Banko and Brill (2001) suggests that this would not necessarily be true if very large training corpora were available. However, their results are limited by the simplicity of their evaluation task and individual classifiers.Our work explores ensemble efficacy for the more complex task of automatic thesaurus extraction on up to 300 million words. We examine our conflicting results in terms of the constraints on, and complexity of, different contextual representations, which contribute to the sparseness-and noise-induced bias behaviour of NLP systems on very large corpora.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[3]  Erik F. Tjong Kim Sang,et al.  Noun Phrase Recognition by System Combination , 2000, ANLP.

[4]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[5]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[6]  Hopkins UniversityBaltimore Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .

[7]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[8]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[9]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[10]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[11]  John A. Carroll,et al.  Robust, applied morphological generation , 2000, INLG.

[12]  B. V. Verghese,et al.  Thesaurus of English Words and Phrases , 2002 .

[13]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[14]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[15]  James R. Curran,et al.  Scaling Context Space , 2002, ACL.

[16]  Eric Brill,et al.  Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999, EMNLP.

[17]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[18]  Stephen Clark,et al.  Class-Based Probability Estimation Using a Semantic Hierarchy , 2002, CL.

[19]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[20]  James R. Curran,et al.  Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[21]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[22]  Gerda Ruge,et al.  Automatic Detection of Thesaurus relations for Information Retrieval Applications , 1997, Foundations of Computer Science: Potential - Theory - Cognition.

[23]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .