论文信息 - TaskNorm: Rethinking Batch Normalization for Meta-Learning - 字舞流文

TaskNorm: Rethinking Batch Normalization for Meta-Learning

Modern meta-learning approaches for image classification rely on increasingly deep networks to achieve state-of-the-art performance, making batch normalization an essential component of meta-learning pipelines. However, the hierarchical nature of the meta-learning setting presents several challenges that can render conventional batch normalization ineffective, giving rise to the need to rethink normalization in this setting. We evaluate a range of approaches to batch normalization for meta-learning scenarios, and develop a novel approach that we call TaskNorm. Experiments on fourteen datasets demonstrate that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient based and gradient-free meta-learning approaches. Importantly, TaskNorm is found to consistently improve performance. Finally, we provide a set of best practices for normalization that will allow fair comparison of meta-learning algorithms.

Sebastian Nowozin | Richard E. Turner | James Requeima | Jonathan Gordon | John Bronskill

[1] Sebastian Nowozin,et al. Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[2] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3] Thomas L. Griffiths,et al. Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[4] Eunho Yang,et al. Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[5] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[6] Joshua B. Tenenbaum,et al. Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[7] Sergey Levine,et al. Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[8] Tom Heskes,et al. Empirical Bayes for Learning to Learn , 2000, ICML.

[9] Amos Storkey,et al. Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[10] Joshua B. Tenenbaum,et al. One shot learning of simple visual concepts , 2011, CogSci.

[11] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[13] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[14] Lionel M. Ni,et al. Generalizing from a Few Examples , 2020, ACM Comput. Surv..

[15] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[16] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .

[17] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[18] Hyo-Eun Kim,et al. Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks , 2018, NeurIPS.

[19] Hugo Larochelle,et al. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[20] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[21] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[22] Jianfeng Zhan,et al. Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks , 2017, ICANN.

[23] Tom Heskes,et al. Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[24] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[25] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[26] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[27] Kaiming He,et al. Group Normalization , 2018, ECCV.

[28] Sebastian Nowozin,et al. Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes , 2019, NeurIPS.

[29] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[30] Sergey Levine,et al. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[31] Shankar Krishnan,et al. Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Luca Bertinetto,et al. Learning (to learn) from few examples , 2019 .

[33] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.