Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories

For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs. We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories, where finding relevant examples to evaluate on is particularly challenging. Our key insight is that simultaneous calibration and importance sampling enables accurate estimates even in the low-sample regime (< 300 samples). Critically, we also derive an accurate single-trial estimator of the variance of our method and demonstrate that this estimator is empirically accurate at low sample counts, enabling a practitioner to know how well they can trust a given low-sample estimate. When validating state-ofthe-art semi-supervised models on ImageNet and iNaturalist2017, our method achieves the same estimates of model performance with up to 10× fewer labels than competing approaches. In particular, we can estimate model F1 scores with a variance of 0.005 using as few as 100 labels.

[1]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[2]  Paul N. Bennett,et al.  Online stratified sampling: evaluating classifiers at web-scale , 2010, CIKM.

[3]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[4]  Charless C. Fowlkes,et al.  Active Testing: An Efficient and Robust Framework for Estimating Accuracy , 2018, ICML.

[5]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[6]  Pietro Perona,et al.  A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[8]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[10]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[11]  J. L. Doob,et al.  The Limiting Distributions of Certain Statistics , 1935 .

[12]  Tobias Scheffer,et al.  Active Estimation of F-Measures , 2010, NIPS.

[13]  Ganesh Venkataraman,et al.  Not Your Grandfathers Test Set: Reducing Labeling Effort for Testing , 2020, ArXiv.

[14]  Benjamin A. Miller,et al.  Classifier Performance Estimation with Unbalanced, Partially Labeled Data , 2017, COST@SDM.

[15]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[16]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[17]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[18]  Md. Mustafizur Rahman,et al.  Efficient Test Collection Construction via Active Learning , 2018, ICTIR.

[19]  Byron C. Wallace,et al.  Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them) , 2012, 2012 IEEE 12th International Conference on Data Mining.

[20]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[21]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[22]  Robert E. Kass,et al.  Importance sampling: a review , 2010 .

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Suvrit Sra,et al.  Near Optimal Stratified Sampling , 2019, ArXiv.

[25]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[26]  Ashish Sabharwal,et al.  How Good Are My Predictions? Efficiently Approximating Precision-Recall Curves for Massive Datasets , 2017, UAI.

[27]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[28]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[29]  Reinhard Koch,et al.  A Survey on Semi-, Self- and Unsupervised Learning for Image Classification , 2020, IEEE Access.

[30]  Kyunghyun Cho,et al.  A Framework For Contrastive Self-Supervised Learning And Designing A New Approach , 2020, ArXiv.

[31]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[32]  Xiaoxing Ma,et al.  Boosting operational DNN testing efficiency through conditioning , 2019, ESEC/SIGSOFT FSE.