The Combination of Text Classifiers Using Reliability Indicators

The intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build a better metaclassifier via some combination of classifiers. We introduce a probabilistic method for combining classifiers that considers the context-sensitive reliabilities of contributing classifiers. The method harnesses reliability indicators—variables that provide signals about the performance of classifiers in different situations. We provide background, present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs, and review a set of comparative studies undertaken to evaluate the methodology.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  B. Brown Proceedings of the Society of Photo-optical Instrumentation Engineers , 1975 .

[3]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[4]  Eric Horvitz,et al.  Decision theory in expert systems and artificial intelligenc , 1988, Int. J. Approx. Reason..

[5]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[6]  Lawrence A. Klein,et al.  Sensor and Data Fusion Concepts and Applications , 1993 .

[7]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[8]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[9]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[10]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[12]  David D. Lewis,et al.  A sequential algorithm for training text classifiers: corrigendum and additional data , 1995, SIGF.

[13]  W. Bruce Croft,et al.  Combining automatic and manual index representations in probabilistic retrieval , 1995 .

[14]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[15]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[16]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[17]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[18]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[19]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[20]  João Gama,et al.  Local Cascade Generalization , 1998, International Conference on Machine Learning.

[21]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[22]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[23]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[24]  João Gama,et al.  Combining Classifiers by Constructive Induction , 1998, ECML.

[25]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[26]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[27]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[28]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[29]  Eric Horvitz,et al.  Bayesian Modality Fusion: Probabilistic Integration of Multiple Vision Algorithms for Head Tracking , 1999 .

[30]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[31]  Eric Horvitz,et al.  Attention-Sensitive Alerting , 1999, UAI.

[32]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[33]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[34]  Yiming Yang,et al.  Combining Multiple Learning Strategies for Effective Cross Validation , 2000, ICML.

[35]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[36]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[37]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[38]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[39]  D. Heckerman,et al.  Dependency networks for inference , 2000 .

[40]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[41]  Wai Lam,et al.  A meta-learning approach for text categorization , 2001, SIGIR '01.

[42]  Peter Jackson,et al.  Combining multiple classifiers for text categorization , 2001, CIKM '01.

[43]  Susan T. Dumais,et al.  Probabilistic combination of text classifiers using reliability indicators: models and results , 2002, SIGIR '02.

[44]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[45]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[46]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[47]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[48]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .