Meta-Learning via Hypernetworks

Recent developments in few-shot learning have shown that during fast adaption, gradient-based meta-learners mostly rely on embedding features of powerful pretrained networks. This leads us to research ways to effectively adapt features and utilize the meta-learner's full potential. Here, we demonstrate the effectiveness of hypernetworks in this context. We propose a soft row-sharing hypernetwork architecture and show that training the hypernetwork with a variant of MAML is tightly linked to meta-learning a curvature matrix used to condition gradients during fast adaptation. We achieve similar results as state-of-art model-agnostic methods in the overparametrized case, while outperforming many MAML variants without using different optimization schemes in the compressive regime. Furthermore, we empirically show that hypernetworks do leverage the inner loop optimization for better adaptation, and analyse how they naturally try to learn the shared curvature of constructed tasks on a toy problem when using our proposed training algorithm.

[1]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[2]  Benjamin Van Roy,et al.  Hypermodels for Exploration , 2020, ICLR.

[3]  Yee Whye Teh,et al.  Multiplicative Interactions and Where to Find Them , 2020, ICLR.

[4]  Yue Wang,et al.  Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[5]  Oriol Vinyals,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2019, ICLR.

[6]  Andrei A. Rusu,et al.  Meta-Learning with Warped Gradient Descent , 2019, ICLR.

[7]  Guodong Zhang,et al.  Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model , 2019, NeurIPS.

[8]  Sebastian Nowozin,et al.  Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes , 2019, NeurIPS.

[9]  Benjamin F. Grewe,et al.  Continual learning with hypernetworks , 2019, ICLR.

[10]  Junier B. Oliva,et al.  Meta-Curvature , 2019, NeurIPS.

[11]  Michael Maire,et al.  Learning Implicitly Recurrent CNNs Through Parameter Sharing , 2019, ICLR.

[12]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[13]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[14]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[15]  Andrea Vedaldi,et al.  Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[17]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[18]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[19]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[20]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[22]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[23]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[24]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[25]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .