Large-scale probabilistic predictors with and without guarantees of validity

This paper studies theoretically and empirically a method of turning machine-learning algorithms into probabilistic predictors that automatically enjoys a property of validity (perfect calibration) and is computationally efficient. The price to pay for perfect calibration is that these probabilistic predictors produce imprecise (in practice, almost precise for large data sets) probabilities. When these imprecise probabilities are merged into precise probabilities, the resulting predictors, while losing the theoretical property of perfect calibration, are consistently more accurate than the existing methods in empirical studies.

[1]  Vladimir Vovk,et al.  The Fundamental Nature of the Log Loss Function , 2015, Fields of Logic and Computation II.

[2]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[3]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[4]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[5]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[8]  Gordon D. Murray Nonconvergence of the minimax order algorithm , 1983 .

[9]  Xiaoqian Jiang,et al.  Smooth Isotonic Regression: A New Method to Calibrate Predictive Models , 2011, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[10]  Chu-in Charles Lee,et al.  The Min-Max Algorithm and Isotonic Regression , 1983 .

[11]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[12]  Vladimir Vovk,et al.  Venn-Abers Predictors , 2012, UAI.

[13]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[14]  Harris Papadopoulos,et al.  Reliable Probability Estimates Based on Support Vector Machines for Large Multiclass Datasets , 2012, AIAI.

[15]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[16]  Ronald L. Graham,et al.  An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set , 1972, Inf. Process. Lett..

[17]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[18]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[19]  Andreas Christmann,et al.  Support Vector Machines , 2008, Data Mining and Knowledge Discovery Handbook.

[20]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .