论文信息 - Probabilistic combination of text classifiers using reliability indicators: models and results

Probabilistic combination of text classifiers using reliability indicators: models and results

The intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build a better metaclassifier via some combination of classifiers. We introduce a probabilistic method for combining classifiers that considers the context-sensitive reliabilities of contributing classifiers. The method harnesses reliability indicators---variables that provide a valuable signal about the performance of classifiers in different situations. We provide background, present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs, and review a set of comparative studies undertaken to evaluate the methodology.

Susan T. Dumais | Paul N. Bennett | Eric Horvitz | S. Dumais | E. Horvitz

[1] Susan T. Dumais,et al. Hierarchical classification of Web content , 2000, SIGIR '00.

[2] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[3] James P. Callan,et al. Training algorithms for linear text classifiers , 1996, SIGIR '96.

[4] David Maxwell Chickering,et al. Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[5] Peter Jackson,et al. Combining multiple classifiers for text categorization , 2001, CIKM '01.

[6] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[7] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8] David E. Johnson,et al. Maximizing Text-Mining Performance , 1999 .

[9] Anil K. Jain,et al. Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[10] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[11] Tom Fawcett,et al. Robust Classification for Imprecise Environments , 2000, Machine Learning.

[12] Edward A. Fox,et al. Combination of Multiple Searches , 1993, TREC.

[13] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[14] David Maxwell Chickering,et al. A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[15] Yiming Yang,et al. Combining Multiple Learning Strategies for Effective Cross Validation , 2000, ICML.

[16] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[17] W. Bruce Croft,et al. Combining classifiers in text categorization , 1996, SIGIR '96.

[18] Nicholas J. Belkin,et al. The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[19] Eric Horvitz,et al. Bayesian Modality Fusion: Probabilistic Integration of Multiple Vision Algorithms for Head Tracking , 1999 .

[20] David D. Lewis,et al. A sequential algorithm for training text classifiers: corrigendum and additional data , 1995, SIGF.

[21] Jeffrey Katzer,et al. A study of the overlap among document representations , 1983, SIGIR '83.