Active Testing: Sample-Efficient Model Evaluation

We introduce a new framework for sampleefficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation. This creates a disconnect to real applications, where test labels are important and just as expensive, e.g. for optimizing hyperparameters. Active testing addresses this by carefully selecting the test points to label, ensuring model evaluation is sample-efficient. To this end, we derive theoretically-grounded and intuitive acquisition strategies that are specifically tailored to the goals of active testing, noting these are distinct to those of active learning. As actively selecting labels introduces a bias; we further show how to remove this bias while reducing the variance of the estimator at the same time. Active testing is easy to implement and can be applied to any supervised machine learning method. We demonstrate its effectiveness on models including WideResNets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  H. Kahn,et al.  Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..

[3]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[4]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[5]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[6]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[7]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  H. Wynn,et al.  Maximum entropy sampling and optimal Bayesian experimental design , 2000 .

[10]  Carl E. Rasmussen,et al.  Bayesian Monte Carlo , 2002, NIPS.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Masashi Sugiyama,et al.  Active Learning for Misspecified Models , 2005, NIPS.

[13]  Francis R. Bach,et al.  Active learning for misspecified generalized linear models , 2006, NIPS.

[14]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[19]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[20]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[21]  Paul N. Bennett,et al.  Online stratified sampling: evaluating classifiers at web-scale , 2010, CIKM.

[22]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[23]  Steffen Bickel,et al.  Active Risk Estimation , 2010, ICML.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[26]  Alexander G. Gray,et al.  UPAL: Unbiased Pool Based Active Learning , 2011, AISTATS.

[27]  Carl E. Rasmussen,et al.  Active Learning of Model Evidence Using Bayesian Quadrature , 2012, NIPS.

[28]  Sunita Sarawagi,et al.  Active Evaluation of Classifiers on Large Datasets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[29]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[30]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[35]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[36]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[37]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[38]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[39]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[40]  Bhiksha Raj,et al.  Classifier Risk Estimation under Limited Labeling Resources , 2016, PAKDD.

[41]  Charless C. Fowlkes,et al.  Active Testing: An Efficient and Robust Framework for Estimating Accuracy , 2018, ICML.

[42]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[43]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[44]  Roman Garnett,et al.  Automated Model Selection with Bayesian Quadrature , 2019, ICML.

[45]  Practical Obstacles to Deploying Active Learning , 2018, EMNLP.

[46]  Y. Teh,et al.  A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments , 2019, AISTATS.

[47]  Michael A. Osborne,et al.  Radial Bayesian Neural Networks: Beyond Discrete Support In Large-Scale Bayesian Deep Learning , 2019, AISTATS.

[48]  Johan Jonasson,et al.  Optimal sampling in unbiased active learning , 2020, AISTATS.

[49]  Tom Rainforth,et al.  Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design , 2021, ICML.

[50]  Y. Gal,et al.  On Statistical Bias In Active Learning: How and When To Fix It , 2021, ICLR.

[51]  Padhraic Smyth,et al.  Active Bayesian Assessment for Black-Box Classifiers , 2020, ArXiv.

[52]  Sebastian Farquhar,et al.  Appendix for ‘Active Testing: Sample–Efficient Model Evaluation’ , 2021 .