Memory Efficient Meta-Learning with Large Images

Meta learning approaches to few-shot classification are computationally efficient at test time, requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task’s entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task’s training images. This enables us to perform a forward pass on a task’s entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.

[1]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[2]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[3]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[4]  Leonid Sigal,et al.  Improved Few-Shot Visual Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Sebastian Nowozin,et al.  Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[7]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[8]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[9]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Xiaohua Zhai,et al.  A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019 .

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[14]  Edward Cutrell,et al.  ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition , 2021, ArXiv.

[15]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[16]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[17]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[18]  Victor Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Michael Biehl,et al.  On-line Learning in Neural Networks , 1998 .

[20]  Sebastian Nowozin,et al.  Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes , 2019, NeurIPS.

[21]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[22]  Julien Mairal,et al.  Selecting Relevant Features from a Multi-domain Representation for Few-Shot Classification , 2020, ECCV.

[23]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[24]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[25]  Sung Ju Hwang,et al.  Large-Scale Meta-Learning with Continual Trajectory Shifting , 2021, ICML.

[26]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[29]  Steven C. H. Hoi,et al.  Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[30]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[31]  Heike Freud,et al.  On Line Learning In Neural Networks , 2016 .

[32]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[33]  Lucas Beyer,et al.  Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[34]  Amos Storkey,et al.  Defining Benchmarks for Continual Few-Shot Learning , 2020, ArXiv.

[35]  Andrew Zisserman,et al.  CrossTransformers: spatially-aware few-shot transfer , 2020, NeurIPS.

[36]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[37]  Xiaohua Zhai,et al.  Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark , 2021, ArXiv.