Unsupervised Learning via Meta-Learning

A central goal of unsupervised learning is to acquire representations from unlabeled data or experience that can be used for more effective learning of downstream tasks from modest amounts of labeled data. Many prior unsupervised learning works aim to do so by developing proxy objectives based on reconstruction, disentanglement, prediction, and other metrics. Instead, we develop an unsupervised meta-learning method that explicitly optimizes for the ability to learn a variety of tasks from small amounts of data. To do so, we construct tasks from unlabeled data in an automatic way and run meta-learning over the constructed tasks. Surprisingly, we find that, when integrated with meta-learning, relatively simple task construction mechanisms, such as clustering embeddings, lead to good performance on a variety of downstream, human-specified tasks. Our experiments across four image datasets indicate that our unsupervised meta-learning approach acquires a learning algorithm without any labeled data that is applicable to a wide range of downstream classification tasks, improving upon the embedding learned by four prior unsupervised learning methods.

[1]  Sungwan Kim,et al.  Auto-Meta: Automated Gradient Based Meta Learner Search , 2018, ArXiv.

[2]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[3]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[4]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[5]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[6]  Armand Joulin,et al.  Unsupervised Learning by Predicting Noise , 2017, ICML.

[7]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[8]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[9]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[10]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[11]  Mikhail Belkin,et al.  Semi-Supervised Learning , 2021, Machine Learning.

[12]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[13]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[17]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[18]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[19]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[20]  Jascha Sohl-Dickstein,et al.  Learning to Learn Without Labels , 2018, ICLR.

[21]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[22]  David Berthelot,et al.  Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer , 2018, ICLR.

[23]  Vikas K. Garg,et al.  Supervising Unsupervised Learning , 2017, NeurIPS.

[24]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[25]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[26]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[27]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[31]  Trevor Darrell,et al.  Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[32]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[33]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[34]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[35]  Colin Raffel,et al.  Realistic Evaluation of Semi-Supervised Learning Algorithms , 2018, ICLR.

[36]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[37]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[38]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[39]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[42]  Dong Yu,et al.  Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition , 2010 .

[43]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[44]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[45]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[46]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[47]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[48]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[49]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[50]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[51]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[52]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[53]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[56]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[57]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.