论文信息 - Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace - 字舞流文

Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace

Gradient-based meta-learning methods leverage gradient descent to learn the commonalities among various tasks. While previous such methods have been successful in meta-learning tasks, they resort to simple gradient descent during meta-testing. Our primary contribution is the {\em MT-net}, which enables the meta-learner to learn on each layer's activation space a subspace that the task-specific learner performs gradient descent on. Additionally, a task-specific learner of an {\em MT-net} performs gradient descent with respect to a meta-learned distance metric, which warps the activation space to be more sensitive to task identity. We demonstrate that the dimension of this learned subspace reflects the complexity of the task-specific learner's adaptation task, and also that our model is less sensitive to the choice of initial learning rates than previous gradient-based meta-learning methods. Our method achieves state-of-the-art or comparable performance on few-shot classification and regression tasks.

Seungjin Choi | Yoonho Lee | Seungjin Choi | Yoonho Lee

[1] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[2] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[3] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[4] Yongxin Yang,et al. Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[5] Aurko Roy,et al. Learning to Remember Rare Events , 2017, ICLR.

[6] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.

[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[9] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[10] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[11] H. Robbins. A Stochastic Approximation Method , 1951 .

[12] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[13] Stanislav Fort,et al. Gaussian Prototypical Networks for Few-Shot Learning on Omniglot , 2017, ArXiv.

[14] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[16] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[17] Bharath Hariharan,et al. Low-shot visual object recognition , 2016, ArXiv.

[18] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.

[19] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[20] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[21] Raquel Urtasun,et al. Few-Shot Learning Through an Information Retrieval Lens , 2017, NIPS.

[22] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[23] G. Evans,et al. Learning to Optimize , 2008 .

[24] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[25] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[26] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Joan Bruna,et al. Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[28] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[29] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[30] Ruslan Salakhutdinov,et al. Improving One-Shot Learning through Fusing Side Information , 2017, ArXiv.

[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[33] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[34] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[35] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[36] Joshua B. Tenenbaum,et al. Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[37] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.