Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithm works with any underlying model and (unknown) data-generating distribution and does not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision and tabular medical data.

[1]  Yaniv Romano,et al.  Classification with Valid and Adaptive Coverage , 2020, NeurIPS.

[2]  Leying Guan,et al.  Prediction and outlier detection in classification problems , 2019, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[3]  P Bauer,et al.  Multiple testing in clinical trials. , 1991, Statistics in medicine.

[4]  E. Candès,et al.  Testing for outliers with conformal p-values , 2021, The Annals of Statistics.

[5]  J. Tukey Non-Parametric Estimation II. Statistically Equivalent Blocks and Tolerance Regions--The Continuous Case , 1947 .

[6]  John C. Duchi,et al.  Knowing what You Know: valid and validated confidence sets in multiclass and multilabel prediction , 2020, J. Mach. Learn. Res..

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Ryan J. Tibshirani,et al.  Predictive inference with the jackknife+ , 2019, The Annals of Statistics.

[9]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[10]  Sangdon Park,et al.  PAC Prediction Sets Under Covariate Shift , 2021, ICLR.

[11]  W. Hoeffding Probability inequalities for sum of bounded random variables , 1963 .

[12]  J. Parrondo,et al.  Vapnik-Chervonenkis bounds for generalization , 1993 .

[13]  Michael I. Jordan,et al.  Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging , 2022, ICML.

[14]  Michael Wolf,et al.  Centre De Referència En Economia Analítica Barcelona Economics Working Paper Series Working Paper Nº 17 Stewise Multiple Testing as Formalized Data Snooping Stepwise Multiple Testing as Formalized Data Snooping , 2022 .

[15]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2013 .

[16]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  E. Candès,et al.  Conformalized Survival Analysis. , 2021, Journal of the Royal Statistical Society Series B: Statistical Methodology.

[19]  W. Brannath,et al.  A graphical approach to sequentially rejective multiple test procedures , 2009, Statistics in medicine.

[20]  John Shawe-Taylor,et al.  A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[21]  Xiaoyu Hu,et al.  A Distribution-Free Test of Covariate Shift Using Conformal Prediction , 2020 .

[22]  Regina Barzilay,et al.  Few-shot Conformal Prediction with Auxiliary Tasks , 2021, ICML.

[23]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[24]  Emmanuel Candes,et al.  Adaptive Conformal Inference Under Distribution Shift , 2021, NeurIPS.

[25]  Harris Papadopoulos,et al.  Inductive Confidence Machines for Regression , 2002, ECML.

[26]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[27]  S. S. Wilks Determination of Sample Sizes for Setting Tolerance Limits , 1941 .

[28]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[29]  Joel W. Cohen,et al.  The Medical Expenditure Panel Survey: A National Information Resource to Support Healthcare Cost Research and Inform Policy and Practice , 2009, Medical care.

[30]  V. Bentkus,et al.  A tight Gaussian bound for weighted sums of Rademacher random variables , 2013, 1307.3451.

[31]  Regina Barzilay,et al.  Efficient Conformal Prediction via Cascaded Inference with Expanded Admission , 2021, International Conference on Learning Representations.

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Emmanuel J. Candès,et al.  Conformal inference of counterfactuals and individual treatment effects , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[34]  Itamar Friedman,et al.  TResNet: High Performance GPU-Dedicated Architecture , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Gilson T. Shimizu,et al.  Distribution-free conditional predictive bands using density estimators , 2019, AISTATS.

[36]  L. Breiman Arcing the edge , 1997 .

[37]  E. Rio,et al.  Concentration inequalities, large and moderate deviations for self-normalized empirical processes , 2002 .

[38]  Abraham Wald,et al.  An Extension of Wilks' Method for Setting Tolerance Limits , 1943 .

[39]  Michael I. Jordan,et al.  Uncertainty Sets for Image Classifiers using Conformal Prediction , 2021, ICLR.

[40]  S. S. Wilks Statistical Prediction with Special Reference to the Problem of Tolerance Limits , 1942 .

[41]  Yaniv Romano,et al.  Conformalized Quantile Regression , 2019, NeurIPS.

[42]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[43]  Larry A. Wasserman,et al.  A conformal prediction approach to explore functional data , 2013, Annals of Mathematics and Artificial Intelligence.

[44]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[45]  Jing Lei Classification with confidence , 2014 .

[46]  B. Wiens A fixed sequence Bonferroni procedure for testing multiple endpoints , 2003 .

[47]  Arne Hole Some variants on a theorem of Vapnik , 1995 .

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  Vladimir Vovk,et al.  Conformal predictive decision making , 2018, COPA.

[50]  V. Bentkus On Hoeffding’s inequalities , 2004, math/0410159.

[51]  Anastasios Nikolas Angelopoulos,et al.  A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification , 2021, ArXiv.

[52]  Teoria Statistica Delle Classi e Calcolo Delle Probabilità , 2022, The SAGE Encyclopedia of Research Design.

[53]  P. Bartlett,et al.  An inequality for uniform deviations of sample averages from their means , 1999 .

[54]  Leying Guan,et al.  Conformal prediction with localization , 2019, 1908.08558.

[55]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[56]  Michael I. Jordan,et al.  Private Prediction Sets , 2021, ArXiv.

[57]  B. Wiens,et al.  The Fallback Procedure for Evaluating a Single Family of Hypotheses , 2005, Journal of biopharmaceutical statistics.

[58]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[59]  Victor Chernozhukov,et al.  Exact and Robust Conformal Inference Methods for Predictive Machine Learning With Dependent Data , 2018, COLT.

[60]  Jon A. Wellner,et al.  Ratio Limit Theorems for Empirical Processes , 2003 .

[61]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[62]  v. vovk,et al.  Universally Consistent Conformal Predictive Distributions , 2019, COPA.

[63]  Regina Barzilay,et al.  Consistent Accelerated Inference via Confident Adaptive Transformers , 2021, EMNLP.

[64]  Larry A. Wasserman,et al.  Least Ambiguous Set-Valued Classifiers With Bounded Error Levels , 2016, Journal of the American Statistical Association.

[65]  Larry Wasserman,et al.  Distribution-Free Prediction Sets with Random Effects , 2018 .

[66]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Iosif Pinelis,et al.  An asymptotically Gaussian bound on the Rademacher tails , 2010, 1007.2137.

[68]  V. Vovk Testing Randomness Online , 2019, Statistical Science.

[69]  PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction , 2019, ICLR.

[70]  Michael I. Jordan,et al.  Distribution-Free, Risk-Controlling Prediction Sets , 2021, J. ACM.

[71]  Alexander Gammerman,et al.  Conformal calibrators , 2019, COPA.

[72]  The limits of distribution-free conditional predictive inference , 2019, Information and Inference: A Journal of the IMA.

[73]  Emmanuel J. Candès,et al.  Conformal Prediction Under Covariate Shift , 2019, NeurIPS.

[74]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[75]  John C. Duchi,et al.  Robust Validation: Confident Predictions Even When Distributions Shift , 2020, ArXiv.

[76]  Barnabás Póczos,et al.  Cautious Deep Learning , 2018, ArXiv.

[77]  Vladimir Vovk,et al.  Cross-conformal predictors , 2012, Annals of Mathematics and Artificial Intelligence.