A Meta-Learning Approach to Predicting Performance and Data Requirements

We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to a large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-shot regime followed by a linear progression in the high-shot regime. We introduce a novel piecewise power law (PPL) that handles the two data regimes differently. To estimate the parameters of the PPL, we introduce a random forest regressor trained via meta learning that generalizes across classification/detection tasks, ResNet/ViT based architectures, and random/pre-trained initializations. The PPL improves the performance estimation on average by 37% across 16 classification and 33% across 10 detection datasets, compared to the power law. We further extend the PPL to provide a confidence bound and use it to limit the prediction horizon that reduces over-estimation of data by 76% on classification and 91% on detection datasets.

[1]  David Krueger,et al.  Broken Neural Scaling Laws , 2022, ICLR.

[2]  S. Fidler,et al.  Optimizing Data Collection for Machine Learning , 2022, NeurIPS.

[3]  Ibrahim M. Alabdulmohsin,et al.  Revisiting Neural Scaling Laws in Language and Vision , 2022, NeurIPS.

[4]  Avinash Ravichandran,et al.  Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark , 2022, ECCV.

[5]  Tom Dupré la Tour,et al.  Benchopt: Reproducible, efficient and collaborative optimization benchmarks , 2022, NeurIPS.

[6]  S. Fidler,et al.  How Much More Data Do I Need? Estimating Requirements for Downstream Tasks , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Behnam Neyshabur,et al.  Exploring the Limits of Large Scale Pre-training , 2021, ICLR.

[9]  Haojie Li,et al.  A Dataset and Benchmark of Underwater Object Detection for Robot Picking , 2021, 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[10]  Jaehoon Lee,et al.  Explaining neural scaling laws , 2021, Proceedings of the National Academy of Sciences of the United States of America.

[11]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[12]  Derek Hoiem,et al.  Learning Curves for Analysis of Deep Networks , 2020, ICML.

[13]  Qinghua Hu,et al.  Detection and Tracking Meet Drones Challenge , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jonathan S. Rosenfeld,et al.  A Constructive Prediction of the Generalization Error Across Scales , 2019, ICLR.

[15]  Volkan Isler,et al.  MinneApple: A Benchmark Dataset for Apple Detection and Segmentation , 2019, IEEE Robotics and Automation Letters.

[16]  Timnit Gebru,et al.  iCassava 2019Fine-Grained Visual Categorization Challenge , 2019, ArXiv.

[17]  Fang Wan,et al.  SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[19]  Andreas Dengel,et al.  Introducing Eurosat: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[20]  Xiangyu Zhang,et al.  CrowdHuman: A Benchmark for Detecting Human in a Crowd , 2018, ArXiv.

[21]  Kiyoharu Aizawa,et al.  Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Yang Yang,et al.  Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.

[23]  Andreas Dengel,et al.  EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[24]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[26]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[31]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[33]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[36]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[40]  Henri P. Gavin,et al.  The Levenberg-Marquardt method for nonlinear least squares curve-fitting problems c © , 2013 .

[41]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[43]  L. Breiman Random Forests , 2001, Machine Learning.