论文信息 - Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

We introduce a multi-stage ensemble framework, Error-Driven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a generalist, capable of classifying under all classes, to deliver a reasonably accurate initial category ranking given an instance. Edge then computes a confusion graph for the generalist and allocates the learning resources to train experts on relatively small groups of classes that tend to be systematically confused with one another by the generalist. The experts' votes, when invoked on a given instance, yield a reranking of the classes, thereby correcting the errors of the generalist. Our evaluations showcase the improved classification and ranking performance on several large-scale text categorization datasets. Edge is in particular efficient when the underlying learners are efficient. Our study of confusion graphs is also of independent interest.

C. Lee Giles | Jian Huang | Omid Madani

[1] Susan T. Dumais,et al. Hierarchical classification of Web content , 2000, SIGIR '00.

[2] Kagan Tumer,et al. Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[3] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[4] M. Newman,et al. Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5] Yiming Yang,et al. An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[6] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[7] Duncan J. Watts,et al. Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[9] Omid Madani,et al. Large-Scale Many-Class Learning , 2008, SDM.

[10] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[11] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..