Transforming classifier scores into accurate multiclass probability estimates

Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs of other classifiers, or domain knowledge. Previous calibration methods apply only to two-class problems. Here, we show how to obtain accurate probability estimates for multiclass problems by combining calibrated binary probability estimates. We also propose a new method for obtaining calibrated two-class probability estimates that can be applied to any classifier that produces a ranking of examples. Using naive Bayes and support vector machine classifiers, we give experimental results from a variety of two-class and multiclass domains, including direct marketing, text categorization and digit recognition.

[1]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[2]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[3]  A. H. Murphy,et al.  Reliability of Subjective Probability Forecasts of Precipitation and Temperature , 1977 .

[4]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[5]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[6]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[7]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[8]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[9]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[10]  Charles Elkan,et al.  Boosting and Naive Bayesian learning , 1997 .

[11]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[14]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[15]  øöö Blockinøø Well-Trained PETs : Improving Probability Estimation , 2000 .

[16]  Jim Georges,et al.  KDD'99 competition: knowledge discovery contest , 2000, SKDD.

[17]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[18]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[19]  Jason D. M. Rennie,et al.  Improving Multiclass Text Classification with the Support Vector Machine , 2001 .

[20]  B. Zadrozny Reducing multiclass to binary by coupling probability estimates , 2001, NIPS.

[21]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[22]  Charles Elkan,et al.  Magical thinking in data mining: lessons from CoIL challenge 2000 , 2001, KDD '01.