Towards optimally abstaining from prediction with OOD test examples

A common challenge across all areas of machine learning is that training data is not distributed like test data, due to natural shifts, “blind spots,” or adversarial examples; such test examples are referred to as out-of-distribution (OOD) test examples. We consider a model where one may abstain from predicting, at a fixed cost. In particular, our transductive abstention algorithm takes labeled training examples and unlabeled test examples as input, and provides predictions with optimal prediction loss guarantees. The loss bounds match standard generalization bounds when test examples are i.i.d. from the training distribution, but add an additional term that is the cost of abstaining times the statistical distance between the train and test distribution (or the fraction of adversarial examples). For linear regression, we give a polynomial-time algorithm based on Celis-Dennis-Tapia optimization algorithms. For binary classification, we show how to efficiently implement it using a proper agnostic learner (i.e., an Empirical Risk Minimizer) for the class of interest. Our work builds on a recent abstention algorithm of Goldwasser, Kalais, and Montasser [10] for transductive binary classification.

[1]  Olivier Bousquet,et al.  Fast classification rates without standard margin assumptions , 2019, ArXiv.

[2]  J. G. Pierce,et al.  Geometric Algorithms and Combinatorial Optimization , 2016 .

[3]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[4]  Morteza Zadimoghaddam,et al.  Trading off Mistakes and Don't-Know Predictions , 2010, NIPS.

[5]  Yicheng Fang,et al.  Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR , 2020, Radiology.

[6]  Mehryar Mohri,et al.  Learning with Rejection , 2016, ALT.

[7]  Richard A. Tapia,et al.  A trust region strategy for nonlinear equality constrained op-timization , 1984 .

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[10]  Ronald L. Rivest,et al.  Learning complicated concepts reliably and usefully , 1988, COLT '88.

[11]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[12]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[13]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[14]  Daniel Bienstock,et al.  A Note on Polynomial Solvability of the CDT Problem , 2014, SIAM J. Optim..

[15]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[16]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[17]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[18]  Di Tang,et al.  Stealthy Porn: Understanding Real-World Adversarial Images for Illicit Online Promotion , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[19]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.