On Calibration of Nested Dichotomies

Nested dichotomies (NDs) are used as a method of transforming a multiclass classification problem into a series of binary problems. A tree structure is induced that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. In this paper, we demonstrate that these NDs typically exhibit poor probability calibration, even when the binary base models are well-calibrated. We also show that this problem is exacerbated when the binary models are poorly calibrated. We discuss the effectiveness of different calibration strategies and show that accuracy and log-loss can be significantly improved by calibrating both the internal base models and the full ND structure, especially when the number of classes is high.

[1]  Bernhard Pfahringer,et al.  Building Ensembles of Adaptive Nested Dichotomies with Random-Pair Selection , 2016, ECML/PKDD.

[2]  John Langford,et al.  Logarithmic Time Online Multiclass prediction , 2015, NIPS.

[3]  Eyke Hüllermeier,et al.  On the effectiveness of heuristics for learning nested dichotomies: an empirical analysis , 2018, Machine Learning.

[4]  Xiaoqian Jiang,et al.  Smooth Isotonic Regression: A New Method to Calibrate Predictive Models , 2011, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[5]  Stefan Kramer,et al.  Ensembles of nested dichotomies for multi-class problems , 2004, ICML.

[6]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Geoff Holmes,et al.  Probability Calibration Trees , 2017, ACML.

[9]  Elena Montañés,et al.  Using A* for Inference in Probabilistic Classifier Chains , 2015, IJCAI.

[10]  Eyke Hüllermeier,et al.  Ensembles of evolved nested dichotomies for classification , 2018, GECCO.

[11]  Pavel Brazdil,et al.  Metalearning and Algorithm Selection: progress, state of the art and introduction to the 2018 Special Issue , 2017, Machine Learning.

[12]  J. Fox Applied Regression Analysis, Linear Models, and Related Methods , 1997 .

[13]  Bruno Lacroix,et al.  Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum , 2014, Bioinform..

[14]  Stefan Kramer,et al.  Ensembles of Balanced Nested Dichotomies for Multi-class Problems , 2005, PKDD.

[15]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[16]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[19]  Shailesh Acharya,et al.  Deep learning based large scale handwritten Devanagari character recognition , 2015, 2015 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA).

[20]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[21]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[22]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[23]  A. H. Murphy,et al.  Reliability of Subjective Probability Forecasts of Precipitation and Temperature , 1977 .

[24]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[25]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[26]  Geoff Holmes,et al.  Ensembles of Nested Dichotomies with Multiple Subset Evaluation , 2018, PAKDD.

[27]  John Langford,et al.  Logarithmic Time One-Against-Some , 2016, ICML.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[30]  Leon Wenliang Zhong,et al.  Accurate Probability Calibration for Multiple Classifiers , 2013, IJCAI.

[31]  Eyke Hüllermeier,et al.  Consistency of Probabilistic Classifier Trees , 2016, ECML/PKDD.

[32]  Charles Elkan,et al.  Beam search algorithms for multilabel learning , 2013, Machine Learning.

[33]  Paul N. Bennett,et al.  Refined experts: improving classification in large taxonomies , 2009, SIGIR.

[34]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[35]  John Langford,et al.  Error-Correcting Tournaments , 2009, ALT.

[36]  Ohad Shamir,et al.  Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[37]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .