ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms

Machine learning models are being used extensively in many important areas, but there is no guarantee a model will always perform well or as its developers intended. Understanding the correctness of a model is crucial to prevent potential failures that may have significant detrimental impact in critical application areas. In this paper, we propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN). We develop a novel data augmentation method helping to train the BNN to achieve high accuracy. We also devise a theoretic information based sampling strategy to sample data points so as to achieve accurate estimations for the metrics of interest. Finally, we conduct an extensive set of experiments to test various machine learning models for different types of metrics. Our experiments show that the metrics estimations by our method are significantly better than existing baselines.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[3]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[4]  Svetha Venkatesh,et al.  Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation , 2018, NeurIPS.

[5]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[6]  Ashish Sabharwal,et al.  Adaptive Stratified Sampling for Precision-Recall Estimation , 2018, UAI.

[7]  Andrew McCallum,et al.  Toward interactive training and evaluation , 2011, CIKM '11.

[8]  S. Burke-Spolaor,et al.  The High Time Resolution Universe Pulsar Survey - I. System configuration and initial discoveries , 2010, 1006.5744.

[9]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Sebastian Schelter,et al.  Learning to Validate the Predictions of Black Box Classifiers on Unseen Data , 2020, SIGMOD Conference.

[12]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[13]  Matthieu Cord,et al.  Addressing Failure Prediction by Learning Model Confidence , 2019, NeurIPS.

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Ashish Sabharwal,et al.  How Good Are My Predictions? Efficiently Approximating Precision-Recall Curves for Massive Datasets , 2017, UAI.

[16]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[17]  Sunita Sarawagi,et al.  Active Evaluation of Classifiers on Large Datasets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[18]  Bhiksha Raj,et al.  Classifier Risk Estimation under Limited Labeling Resources , 2016, PAKDD.

[19]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[20]  João Gama,et al.  Automatic Classification of Anuran Sounds Using Convolutional Neural Networks , 2016, C3S2E.

[21]  Maya R. Gupta,et al.  To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[22]  Steffen Bickel,et al.  Active Risk Estimation , 2010, ICML.

[23]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[24]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[25]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[26]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[27]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.