Revisiting Meta-Learning as Supervised Learning

Recent years have witnessed an abundance of new publications and approaches on meta-learning. This community-wide enthusiasm has sparked great insights but has also created a plethora of seemingly different frameworks, which can be hard to compare and evaluate. In this paper, we aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning. By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning. This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning. For example, we obtain a better understanding of generalization properties, and we can readily transfer well-understood techniques, such as model ensemble, pre-training, joint training, data augmentation, and even nearest neighbor based methods. We provide an intuitive analogy of these methods in the context of meta-learning and show that they give rise to significant improvements in model performance on few-shot learning.

[1]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[2]  Jian-Jiun Ding,et al.  Facial age estimation based on label-sensitive learning and age-oriented regression , 2013, Pattern Recognit..

[3]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[4]  Jascha Sohl-Dickstein,et al.  Meta-Learning Update Rules for Unsupervised Representation Learning , 2018, ICLR.

[5]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[7]  Yu Zhang,et al.  Learning to Multitask , 2018, NeurIPS.

[8]  Philip Bachman,et al.  Learning Algorithms for Active Learning , 2017, ICML.

[9]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[10]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[11]  Pieter Abbeel,et al.  The Importance of Sampling inMeta-Reinforcement Learning , 2018, NeurIPS.

[12]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[13]  Raquel Urtasun,et al.  Few-Shot Learning Through an Information Retrieval Lens , 2017, NIPS.

[14]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[15]  Dragomir Anguelov,et al.  Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[18]  Yu-Xiong Wang,et al.  Learning to Learn for Small Sample Visual Recognition , 2018 .

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Joshua B. Tenenbaum,et al.  Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[21]  Paolo Frasconi,et al.  A Bridge Between Hyperparameter Optimization and Larning-to-learn , 2017, ArXiv.

[22]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[23]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[24]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[25]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[26]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Balaraman Ravindran,et al.  Learning to Multi-Task by Active Sampling , 2017, ICLR.

[28]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[29]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Vikas K. Garg,et al.  Supervising Unsupervised Learning , 2017, NeurIPS.

[31]  Wei Zhou,et al.  Feature-Critic Networks for Heterogeneous Domain Generalization , 2019, ICML.

[32]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[33]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[34]  Alexander Ilin,et al.  Semi-Supervised Few-Shot Learning with MAML , 2018, ICLR.

[35]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[36]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37]  Quoc V. Le,et al.  Neural Optimizer Search with Reinforcement Learning , 2017, ICML.

[38]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[39]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[41]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[42]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[43]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[44]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[45]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[46]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[47]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[48]  Hugo Larochelle,et al.  A Meta-Learning Perspective on Cold-Start Recommendations for Items , 2017, NIPS.

[49]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Sergey Levine,et al.  Unsupervised Learning via Meta-Learning , 2018, ICLR.

[51]  Jascha Sohl-Dickstein,et al.  Learning Unsupervised Learning Rules , 2018, ArXiv.

[52]  Stefano Soatto,et al.  The Information Complexity of Learning Tasks, their Structure and their Distance , 2019, Information and Inference: A Journal of the IMA.

[53]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Yang Wu,et al.  Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning , 2018, ArXiv.

[55]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[56]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[57]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[58]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[59]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[60]  Yun Fu,et al.  Network Reparameterization for Unseen Class Categorization , 2018 .

[61]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[62]  Massimiliano Pontil,et al.  The Benefit of Multitask Representation Learning , 2015, J. Mach. Learn. Res..

[63]  Bogdan Gabrys,et al.  Metalearning: a survey of trends and technologies , 2013, Artificial Intelligence Review.

[64]  Chelsea Finn,et al.  Learning to Learn with Gradients , 2018 .

[65]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[66]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[67]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[68]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[69]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[70]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[72]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[73]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[74]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75]  Hugo Larochelle,et al.  Meta-Learning for Batch Mode Active Learning , 2018, ICLR.

[76]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[77]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[78]  Po-Sen Huang,et al.  Natural Language to Structured Query Generation via Meta-Learning , 2018, NAACL.

[79]  Hugo Larochelle Few-shot Learning with Meta-Learning: Progress Made and Challenges Ahead , 2018 .

[80]  J. Stenton,et al.  Learning how to teach. , 1973, Nursing mirror and midwives journal.

[81]  Fei Sha,et al.  Learning Embedding Adaptation for Few-Shot Learning , 2018, ArXiv.

[82]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[83]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[84]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[85]  G. Evans,et al.  Learning to Optimize , 2008 .

[86]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[87]  Swami Sankaranarayanan,et al.  MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[88]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[89]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[90]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[91]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[92]  Yoshua Bengio,et al.  MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[93]  Yu Zhang,et al.  Transfer Learning via Learning to Transfer , 2018, ICML.

[94]  Zeb Kurth-Nelson,et al.  Been There, Done That: Meta-Learning with Episodic Recall , 2018, ICML.

[95]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[96]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[97]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[98]  José M. F. Moura,et al.  Few-Shot Human Motion Prediction via Meta-learning , 2018, ECCV.

[99]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[100]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[101]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.