论文信息 - Ensemble Methods for Automatic Thesaurus Extraction

Ensemble Methods for Automatic Thesaurus Extraction

Ensemble methods are state of the art for many NLP tasks. Recent work by Banko and Brill (2001) suggests that this would not necessarily be true if very large training corpora were available. However, their results are limited by the simplicity of their evaluation task and individual classifiers.Our work explores ensemble efficacy for the more complex task of automatic thesaurus extraction on up to 300 million words. We examine our conflicting results in terms of the constraints on, and complexity of, different contextual representations, which contribute to the sparseness-and noise-induced bias behaviour of NLP systems on very large corpora.

James Curran | J. Curran

[1] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[2] Gregory Grefenstette,et al. Explorations in automatic thesaurus discovery , 1994 .

[3] Erik F. Tjong Kim Sang,et al. Noun Phrase Recognition by System Combination , 2000, ANLP.

[4] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[5] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.

[6] Hopkins UniversityBaltimore. Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .

[7] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[8] Dekang Lin,et al. Dependency-Based Evaluation of Minipar , 2003 .

[9] Ted Pedersen,et al. A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[10] Eric Brill,et al. Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[11] John A. Carroll,et al. Robust, applied morphological generation , 2000, INLG.