论文信息 - Empirical Bayes Meta-Learning with Synthetic Gradients

Empirical Bayes Meta-Learning with Synthetic Gradients

We revisit the hierarchical Bayes and empirical Bayes formulations for multi-task learning, which can naturally be applied to meta-learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior of each task. We derive an amortized variational inference that couples all the variational posteriors into a meta-model, which consists of a synthetic gradient network and an initialization network. Our empirical results on the mini-ImageNet benchmark for episodic few-shot classification significantly outperform previous state-of-the-art methods. 1 Meta-learning with transductive inference The goal of meta-learning is to train a meta-model on a collection of tasks, such that it works well on another disjoint collection of tasks. Suppose that we are given a collection of N tasks for training. The associated data is denoted by D := {dt = (xt, yt)}t=1. In the case of few-shot learning, we are given in addition a support set dt for each task. In this section, we revisit the classical empirical Bayes model for meta-learning. Then, we propose to use a transductive scheme in the variational inference by constructing the variational posterior as a function of xt. 1.1 Empirical Bayes model Due to the hierarchical structure among data, it is natural to consider a hierarchical Bayes model for the marginal likelihood

[1] H. Robbins. An Empirical Bayes Approach to Statistics , 1956 .

[2] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[3] David M. Blei,et al. Population Empirical Bayes , 2014, UAI.

[4] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[5] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[6] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[7] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[8] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[9] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[10] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[11] Ron Meir,et al. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[12] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[13] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[14] Nikos Komodakis,et al. Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Alex Beatson,et al. Amortized Bayesian Meta-Learning , 2018, ICLR.

[16] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[17] Yee Whye Teh,et al. Attentive Neural Processes , 2019, ICLR.

[18] Yu-Chiang Frank Wang,et al. A Closer Look at Few-shot Classification , 2019, ICLR.