Protected probabilistic classification

This paper proposes a way of protecting probabilistic prediction models against changes in the data distribution, concentrating on the case of classification and paying particular attention to binary classification. This is important in applications of machine learning, where the quality of a trained prediction algorithm may drop significantly in the process of its exploitation. Our techniques are based on recent work on conformal test martingales and older work on prediction with expert advice, namely tracking the best expert. The version of this paper at http://alrw.net (Working Paper 35) is updated most often and is accompanied with Python code.

[1]  Adolfo Martínez Usó,et al.  UJIIndoorLoc: A new multi-building and multi-floor database for WLAN fingerprint-based indoor localization problems , 2014, 2014 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[2]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[3]  D. Vere-Jones Markov Chains , 1972, Nature.

[4]  V. Vovk Competitive On‐line Statistics , 2001 .

[5]  Protected probabilistic regression , 2021 .

[6]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[7]  Vladimir Vovk,et al.  Prediction with expert advice for the Brier game , 2007, ICML '08.

[8]  Vladimir Vovk,et al.  Game‐Theoretic Foundations for Probability and Finance , 2019, Wiley Series in Probability and Statistics.

[9]  Thomas S. Ferguson,et al.  On the Rejection of Outliers , 1961 .

[10]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[11]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.

[12]  J. Cavanaugh,et al.  Partial Likelihood , 2018, Wiley StatsRef: Statistics Reference Online.

[13]  D. Cox Two further applications of a model for binary regression , 1958 .

[14]  Vladimir Vovk,et al.  Testing Randomness Online , 2019, Statistical Science.

[15]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[18]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[19]  Vladimir Vovk,et al.  The Fundamental Nature of the Log Loss Function , 2015, Fields of Logic and Computation II.

[20]  Paulo Cortez,et al.  A data-driven approach to predict the success of bank telemarketing , 2014, Decis. Support Syst..