Binary classifier calibration using an ensemble of piecewise linear regression models

In this paper, we present a new nonparametric calibration method called ensemble of near-isotonic regression (ENIR). The method can be considered as an extension of BBQ (Naeini et al., in: Proceedings of twenty-ninth AAAI conference on artificial intelligence, 2015b), a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) (Zadrozny and Elkan, in: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining 2002). ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular, on the real data, we evaluated ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large-scale datasets, as it is $$O(N \log N)$$O(NlogN) time, where N is the number of samples.

[1]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[2]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[3]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[7]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[8]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[9]  Hiroya Takamura,et al.  Direct estimation of class membership probabilities for multiclass classification using multiple scores , 2009, Knowledge and Information Systems.

[10]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[11]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[12]  Mahdi Pakdaman Naeini,et al.  Binary Classifier Calibration Using an Ensemble of Linear Trend Estimation , 2016, SDM.

[13]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[14]  Mahdi Pakdaman Naeini,et al.  Binary Classifier Calibration Using an Ensemble of Near Isotonic Regression Models , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[15]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[16]  Tom Fawcett,et al.  PAV and the ROC convex hull , 2007, Machine Learning.

[17]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[18]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[19]  J. Cavanaugh Unifying the derivations for the Akaike and corrected Akaike information criteria , 1997 .

[20]  Tomás Pajdla,et al.  Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, International Journal of Computer Vision.

[21]  Gaurav Pandey,et al.  A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics , 2013, 2013 IEEE 13th International Conference on Data Mining.

[22]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  Philip E. Gill,et al.  Practical optimization , 1981 .

[24]  Harry Zhang,et al.  Naive Bayesian Classifiers for Ranking , 2004, ECML.

[25]  Leon Wenliang Zhong,et al.  Accurate Probability Calibration for Multiple Classifiers , 2013, IJCAI.

[26]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[27]  Jihoon Kim,et al.  Calibrating predictive model estimates to support personalized medicine , 2011, J. Am. Medical Informatics Assoc..

[28]  Xiaoqian Jiang,et al.  Predicting accurate probabilities with a ranking loss , 2012, ICML.

[29]  Milos Hauskrecht,et al.  Binary Classifier Calibration Using a Bayesian Non-Parametric Approach , 2015, SDM.

[30]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[31]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[32]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[33]  Ryan J. Tibshirani,et al.  Fast and Flexible ADMM Algorithms for Trend Filtering , 2014, ArXiv.

[34]  Björn E. Ottersten,et al.  Improving Credit Card Fraud Detection with Calibrated Probabilities , 2014, SDM.

[35]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[36]  Robert Tibshirani,et al.  Nearly-Isotonic Regression , 2011, Technometrics.

[37]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[38]  Nasser Yazdani,et al.  Application of ensemble models in web ranking , 2010, 2010 5th International Symposium on Telecommunications.

[39]  Moisés Goldszmidt,et al.  Properties and Benefits of Calibrated Classifiers , 2004, PKDD.

[40]  Liangxiao Jiang,et al.  Learning k-Nearest Neighbor Naive Bayes for Ranking , 2005, ADMA.

[41]  Stephen P. Boyd,et al.  1 Trend Filtering , 2009, SIAM Rev..

[42]  Byron C. Wallace,et al.  Improving class probability estimates for imbalanced data , 2013, Knowledge and Information Systems.

[43]  José Hernández-Orallo,et al.  On the effect of calibration in classifier combination , 2013, Applied Intelligence.