Meta-Transfer Learning for Few-Shot Learning

Meta-learning has been proposed as a framework to address the challenging few-shot learning setting. The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, meta-learning typically uses shallow neural networks (SNNs), thus limiting its effectiveness. In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. Specifically, "meta" refers to training multiple tasks, and "transfer" is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta-batch scheme as an effective learning curriculum for MTL. We conduct experiments using (5-class, 1-shot) and (5-class, 5-shot) recognition tasks on two challenging few-shot learning benchmarks: miniImageNet and Fewshot-CIFAR100. Extensive comparisons to related works validate that our meta-transfer learning approach trained with the proposed HT meta-batch scheme achieves top performance. An ablation study also shows that both components contribute to fast convergence and high accuracy.

[1]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[2]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[7]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[8]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[10]  Luc Van Gool,et al.  Natural and Effective Obfuscation by Head Inpainting , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  I. Kakadiaris,et al.  Curriculum Learning for Multi-task Classification of Visual Attributes , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[17]  Thomas Brox,et al.  Lucid Data Dreaming for Object Tracking , 2017, ArXiv.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[23]  Rogério Schmidt Feris,et al.  Delta-encoder: an effective sample synthesis method for few-shot object recognition , 2018, NeurIPS.

[24]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[25]  Ambedkar Dukkipati,et al.  Generative Adversarial Residual Pairwise Networks for One Shot Learning , 2017, ArXiv.

[26]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[27]  François Fleuret,et al.  Large Scale Hard Sample Mining with Monte Carlo Tree Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[29]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[30]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[31]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[36]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Tsendsuren Munkhdalai,et al.  Rapid Adaptation with Conditionally Shifted Neurons , 2017, ICML.

[38]  Michael C. Mozer,et al.  Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning , 2018, NeurIPS.

[39]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[40]  Yoshua Bengio,et al.  MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[41]  Yu Zhang,et al.  Transfer Learning via Learning to Transfer , 2018, ICML.

[42]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[44]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[45]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[46]  Ioannis A. Kakadiaris,et al.  Curriculum Learning for Multi-task Classification of Visual Attributes , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[47]  Bernt Schiele,et al.  A Domain Based Approach to Social Relation Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  D. Weinshall,et al.  Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[49]  Richa Singh,et al.  Learning Structure and Strength of CNN Filters for Small Sample Size Training , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Seong Joon Oh,et al.  Natural and Effective Obfuscation by Head Inpainting Supplementary Materials , 2018 .

[51]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[52]  Rong Yan,et al.  Adapting SVM Classifiers to Data with Shifted Distributions , 2007 .

[53]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[54]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[56]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[57]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[58]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[59]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[60]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .