论文信息 - Large-Scale Meta-Learning with Continual Trajectory Shifting

Large-Scale Meta-Learning with Continual Trajectory Shifting

Meta-learning of shared initialization parameters has shown to be highly effective in solving few-shot learning tasks. However, extending the framework to many-shot scenarios, which may further enhance its practicality, has been relatively overlooked due to the technical difficulties of meta-learning over long chains of inner-gradient steps. In this paper, we first show that allowing the meta-learners to take a larger number of inner gradient steps better captures the structure of heterogeneous and large-scale task distributions, thus results in obtaining better initialization points. Further, in order to increase the frequency of meta-updates even with the excessively long inner-optimization trajectories, we propose to estimate the required shift of the task-specific parameters with respect to the change of the initialization parameters. By doing so, we can arbitrarily increase the frequency of meta-updates and thus greatly improve the meta-level convergence as well as the quality of the learned initializations. We validate our method on a heterogeneous set of large-scale tasks and show that the algorithm largely outperforms the previous first-order meta-learning methods in terms of both generalization performance and convergence, as well as multi-task learning and fine-tuning baselines.

Sung Ju Hwang | Boqing Gong | Jaewoong Shin | Hae Beom Lee

[1] Renjie Liao,et al. Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.

[2] Neil D. Lawrence,et al. Transferring Knowledge across Learning Processes , 2018, ICLR.

[3] Sergey Levine,et al. Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[4] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[5] Subhransu Maji,et al. Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[7] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[8] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[11] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[12] Fei-Fei Li,et al. Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[13] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[14] David Balduzzi,et al. Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks , 2016, ICML.

[15] Eunho Yang,et al. Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[16] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[17] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[18] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[19] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[20] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[21] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[22] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[23] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.

[24] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.

[26] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[27] Razvan Pascanu,et al. Meta-Learning with Warped Gradient Descent , 2020, ICLR.

[28] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[29] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32] Jinwoo Shin,et al. Learning What and Where to Transfer , 2019, ICML.

[33] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[34] Alexandre Lacoste,et al. TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[35] Brian McWilliams,et al. The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.

[36] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[37] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[38] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[40] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[41] Stefano Soatto,et al. A Baseline for Few-Shot Image Classification , 2019, ICLR.

[42] Wenbo Gao,et al. ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[43] Seungjin Choi,et al. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.