On the Calibration of Nested Dichotomies for Large Multiclass Tasks

Nested dichotomies are used as a method of transforming a multiclass classification problem into a series of binary problems. A tree structure is induced that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. In this paper, we demonstrate that these nested dichotomies typically exhibit poor probability calibration, even when the base binary models are well calibrated. We also show that this problem is exacerbated when the binary models are poorly calibrated. We discuss the effectiveness of different calibration strategies and show that accuracy and log-loss can be significantly improved by calibrating both the internal base models and the full nested dichotomy structure, especially when the number of classes is high.

[1]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[2]  Ohad Shamir,et al.  Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[3]  John Langford,et al.  Logarithmic Time One-Against-Some , 2016, ICML.

[4]  Stefan Kramer,et al.  Ensembles of nested dichotomies for multi-class problems , 2004, ICML.

[5]  Leon Wenliang Zhong,et al.  Accurate Probability Calibration for Multiple Classifiers , 2013, IJCAI.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[8]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[9]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[10]  Charles Elkan,et al.  Beam search algorithms for multilabel learning , 2013, Machine Learning.

[11]  Paul N. Bennett,et al.  Refined experts: improving classification in large taxonomies , 2009, SIGIR.

[12]  Geoff Holmes,et al.  Probability Calibration Trees , 2017, ACML.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[15]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[16]  John Langford,et al.  Logarithmic Time Online Multiclass prediction , 2015, NIPS.

[17]  Eyke Hüllermeier,et al.  On the effectiveness of heuristics for learning nested dichotomies: an empirical analysis , 2018, Machine Learning.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Eyke Hüllermeier,et al.  Consistency of Probabilistic Classifier Trees , 2016, ECML/PKDD.

[20]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[21]  Geoff Holmes,et al.  Ensembles of Nested Dichotomies with Multiple Subset Evaluation , 2018, PAKDD.

[22]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[23]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[24]  Bruno Lacroix,et al.  Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum , 2014, Bioinform..

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Shailesh Acharya,et al.  Deep learning based large scale handwritten Devanagari character recognition , 2015, 2015 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA).

[27]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[28]  Elena Montañés,et al.  Using A* for Inference in Probabilistic Classifier Chains , 2015, IJCAI.

[29]  Stefan Kramer,et al.  Ensembles of Balanced Nested Dichotomies for Multi-class Problems , 2005, PKDD.

[30]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[31]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[32]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[33]  Juan José Rodríguez Diez,et al.  Forests of nested dichotomies , 2010, Pattern Recognit. Lett..

[34]  J. Fox Applied Regression Analysis, Linear Models, and Related Methods , 1997 .

[35]  Bernhard Pfahringer,et al.  Building Ensembles of Adaptive Nested Dichotomies with Random-Pair Selection , 2016, ECML/PKDD.

[36]  Xiaoqian Jiang,et al.  Smooth Isotonic Regression: A New Method to Calibrate Predictive Models , 2011, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[37]  Eyke Hüllermeier,et al.  Ensembles of evolved nested dichotomies for classification , 2018, GECCO.