Detecting Covariate Shift with Black Box Predictors

Many Machine Learning algorithms aiming at classifying signals/images $X$ among a number of discrete labels $Y$ involve training instances, from which the predictor $P_{Y\vert X}$ is extracted according to the data distribution $P_{X\vert Y}$. This predictor is later used to predict the appropriate label for other instances of $X$ that are hence assumed to be drawn from the same distribution. This is a fundamental requirement for many realworld applications, therefore it is of great importance to monitor the reliability of the classification provided by the algorithm based on the learned distributions, when the test set statistics differ from the training set ones. This paper makes a step in that direction by proposing a Black Box Shift Detector of the data evolution (covariate shift). ‘Black Box’ here means that it does not require any knowledge of the predictor's architecture. Experiments demonstrate accurate detection on different high-dimensional datasets of natural images.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  R. Srikant,et al.  Principled Detection of Out-of-Distribution Examples in Neural Networks , 2017, ArXiv.

[3]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[6]  Maya R. Gupta,et al.  To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[7]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[8]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[9]  Sreeram Kannan,et al.  Communication Algorithms via Deep Learning , 2018, ICLR.

[10]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .