Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

We introduce a multi-stage ensemble framework, Error-Driven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a generalist, capable of classifying under all classes, to deliver a reasonably accurate initial category ranking given an instance. Edge then computes a confusion graph for the generalist and allocates the learning resources to train experts on relatively small groups of classes that tend to be systematically confused with one another by the generalist. The experts' votes, when invoked on a given instance, yield a reranking of the classes, thereby correcting the errors of the generalist. Our evaluations showcase the improved classification and ranking performance on several large-scale text categorization datasets. Edge is in particular efficient when the underlying learners are efficient. Our study of confusion graphs is also of independent interest.

[1]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[2]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[6]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Omid Madani,et al.  Large-Scale Many-Class Learning , 2008, SDM.

[10]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[11]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[12]  Mohammad R. Salavatipour,et al.  Recall Systems: Effcient Learning and Use of Category Indices , 2007, AISTATS.

[13]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[14]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[15]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  William W. Cohen,et al.  Single-pass online learning: performance, voting schemes and online feature selection , 2006, KDD '06.

[18]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[19]  Jian Huang,et al.  On updates that constrain the features' connections during learning , 2008, KDD.

[20]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[21]  Kagan Tumer,et al.  Robust Combining of Disparate Classifiers through Order Statistics , 1999, Pattern Analysis & Applications.

[22]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[23]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[24]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[25]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[26]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[27]  Yiming Yang,et al.  A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[28]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[29]  Sunita Sarawagi,et al.  Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[30]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[31]  Andrea Esuli,et al.  TreeBoost.MH: A Boosting Algorithm for Multi-label Hierarchical Text Categorization , 2006, SPIRE.

[32]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.