Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

We introduce Learn then Test, a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees regardless of the underlying model and (unknown) datagenerating distribution. The framework addresses, among other examples, false discovery rate control in multilabel classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. To accomplish this, we solve a key technical challenge: the control of arbitrary risks that are not necessarily monotonic. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Iosif Pinelis,et al.  An asymptotically Gaussian bound on the Rademacher tails , 2010, 1007.2137.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Vladimir Vovk,et al.  Cross-conformal predictors , 2012, Annals of Mathematics and Artificial Intelligence.

[5]  Ryan J. Tibshirani,et al.  Predictive inference with the jackknife+ , 2019, The Annals of Statistics.

[6]  W. Brannath,et al.  A graphical approach to sequentially rejective multiple test procedures , 2009, Statistics in medicine.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Regina Barzilay,et al.  Few-shot Conformal Prediction with Auxiliary Tasks , 2021, ICML.

[9]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[10]  Michael Wolf,et al.  Centre De Referència En Economia Analítica Barcelona Economics Working Paper Series Working Paper Nº 17 Stewise Multiple Testing as Formalized Data Snooping Stepwise Multiple Testing as Formalized Data Snooping , 2022 .

[11]  S. S. Wilks Statistical Prediction with Special Reference to the Problem of Tolerance Limits , 1942 .

[12]  Abraham Wald,et al.  An Extension of Wilks' Method for Setting Tolerance Limits , 1943 .

[13]  E. Candès,et al.  The limits of distribution-free conditional predictive inference , 2019, Information and Inference: A Journal of the IMA.

[14]  Insup Lee,et al.  PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction , 2020, ICLR.

[15]  Victor Chernozhukov,et al.  Exact and Robust Conformal Inference Methods for Predictive Machine Learning With Dependent Data , 2018, COLT.

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Stephen Bates,et al.  A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification , 2021, ArXiv.

[18]  Emmanuel J. Candès,et al.  Conformal inference of counterfactuals and individual treatment effects , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[19]  Emmanuel J. Candes,et al.  Conformalized Survival Analysis. , 2021, ArXiv.

[20]  Rafael Izbicki,et al.  Distribution-free conditional predictive bands using density estimators , 2020, AISTATS.

[21]  Vladimir Vovk,et al.  Testing Randomness Online , 2019, Statistical Science.

[22]  John C. Duchi,et al.  Knowing what You Know: valid and validated confidence sets in multiclass and multilabel prediction , 2020, J. Mach. Learn. Res..

[23]  Jing Lei Classification with confidence , 2014 .

[24]  Vladimir Vovk,et al.  Conformal predictive decision making , 2018, COPA.

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Alexander Gammerman,et al.  Conformal calibrators , 2019, COPA.

[27]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[28]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[29]  P. Bartlett,et al.  An inequality for uniform deviations of sample averages from their means , 1999 .

[30]  P Bauer,et al.  Multiple testing in clinical trials. , 1991, Statistics in medicine.

[31]  B. Wiens A fixed sequence Bonferroni procedure for testing multiple endpoints , 2003 .

[32]  E. Rio,et al.  Concentration inequalities, large and moderate deviations for self-normalized empirical processes , 2002 .

[33]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[34]  J. Tukey Non-Parametric Estimation II. Statistically Equivalent Blocks and Tolerance Regions--The Continuous Case , 1947 .

[35]  Harris Papadopoulos,et al.  Inductive Confidence Machines for Regression , 2002, ECML.

[36]  Jon A. Wellner,et al.  Ratio Limit Theorems for Empirical Processes , 2003 .

[37]  v. vovk,et al.  Universally Consistent Conformal Predictive Distributions , 2019, COPA.

[38]  S. S. Wilks Determination of Sample Sizes for Setting Tolerance Limits , 1941 .

[39]  Emmanuel J. Candès,et al.  Conformal Prediction Under Covariate Shift , 2019, NeurIPS.

[40]  Leying Guan,et al.  Prediction and outlier detection in classification problems , 2019, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[41]  J. Parrondo,et al.  Vapnik-Chervonenkis bounds for generalization , 1993 .

[42]  Edgar Dobriban,et al.  PAC Prediction Sets Under Covariate Shift , 2021, ArXiv.

[43]  Arne Hole Some variants on a theorem of Vapnik , 1995 .

[44]  Michael I. Jordan,et al.  Uncertainty Sets for Image Classifiers using Conformal Prediction , 2021, ICLR.

[45]  Barnabás Póczos,et al.  Cautious Deep Learning , 2018, ArXiv.

[46]  Regina Barzilay,et al.  Consistent Accelerated Inference via Confident Adaptive Transformers , 2021, EMNLP.

[47]  Michael I. Jordan,et al.  Distribution-Free, Risk-Controlling Prediction Sets , 2021, J. ACM.

[48]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[49]  Larry Wasserman,et al.  Distribution-Free Prediction Sets with Random Effects , 2018 .

[50]  Emmanuel Candes,et al.  Adaptive Conformal Inference Under Distribution Shift , 2021, NeurIPS.

[51]  Regina Barzilay,et al.  Efficient Conformal Prediction via Cascaded Inference with Expanded Admission , 2021, International Conference on Learning Representations.

[52]  John C. Duchi,et al.  Robust Validation: Confident Predictions Even When Distributions Shift , 2020, ArXiv.

[53]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[54]  Yaniv Romano,et al.  Conformalized Quantile Regression , 2019, NeurIPS.

[55]  Leying Guan,et al.  Conformal prediction with localization , 2019, 1908.08558.

[56]  Yaniv Romano,et al.  Classification with Valid and Adaptive Coverage , 2020, NeurIPS.

[57]  V. Bentkus,et al.  A tight Gaussian bound for weighted sums of Rademacher random variables , 2013, 1307.3451.

[58]  John Shawe-Taylor,et al.  A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[59]  Yaniv Romano,et al.  Testing for Outliers with Conformal p-values , 2021 .

[60]  Xiaoyu Hu,et al.  A Distribution-Free Test of Covariate Shift Using Conformal Prediction , 2020 .

[61]  B. Wiens,et al.  The Fallback Procedure for Evaluating a Single Family of Hypotheses , 2005, Journal of biopharmaceutical statistics.

[62]  Michael I. Jordan,et al.  Private Prediction Sets , 2021, ArXiv.

[63]  Itamar Friedman,et al.  TResNet: High Performance GPU-Dedicated Architecture , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[64]  Larry A. Wasserman,et al.  Least Ambiguous Set-Valued Classifiers With Bounded Error Levels , 2016, Journal of the American Statistical Association.

[65]  V. Bentkus On Hoeffding’s inequalities , 2004, math/0410159.

[66]  Larry A. Wasserman,et al.  A conformal prediction approach to explore functional data , 2013, Annals of Mathematics and Artificial Intelligence.

[67]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2013 .

[68]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[69]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[70]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.