Online Unsupervised Learning of Visual Representations and Categories

Real world learning scenarios involve a nonstationary distribution of classes with sequential depen-dencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution. Furthermore, real world interactions demand learning on-the-fly from few or no class labels. In this work, we propose an unsupervised model that simultaneously performs online visual representation learning and few-shot learning of new categories without relying on any class labels. Our model is a prototype-based memory network with a control component that determines when to form a new class prototype. We formulate it as an online mixture model, where components are created with only a single new example, and assignments do not have to be balanced, which permits an approximation to natural imbalanced distributions from uncurated raw data. Learning includes a contrastive loss that encourages different views of the same image to be assigned to the same prototype. The result is a mechanism that forms categorical representations of objects in nonstationary environments. Experiments show that our method can learn from an online stream of visual input data and its learned representations are significantly better at category recognition compared to state-of-the-art self-supervised learning methods.

[1]  Fei Yin,et al.  Prototype Augmentation and Self-Supervision for Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Tyler L. Hayes,et al.  Self-Supervised Training Enhances Online Continual Learning , 2021, BMVC.

[3]  Yuwen Xiong,et al.  Self-Supervised Representation Learning from Flow Equivariance , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Richard S. Zemel,et al.  Wandering Within a World: Online Contextualized Few-Shot Learning , 2020, ICLR.

[6]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[7]  Michael C. Frank,et al.  SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded From the Infant’s Perspective , 2020, Open Mind.

[8]  Gunhee Kim,et al.  Imbalanced Continual Learning with Partitioning Reservoir Sampling , 2020, ECCV.

[9]  B. Lake,et al.  Self-supervised learning through the eyes of a child , 2020, NeurIPS.

[10]  Matthias Grossglauser,et al.  Self-Supervised Prototypical Transfer Learning for Few-Shot Classification , 2020, ICML 2020.

[11]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[12]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[13]  Zhihong Deng,et al.  Self-Supervised Learning Aided Class-Incremental Lifelong Learning , 2020, ArXiv.

[14]  Chen Change Loy,et al.  Online Deep Clustering for Unsupervised Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Asim Kadav,et al.  S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaopeng Hong,et al.  Few-Shot Class-Incremental Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[18]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[19]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[20]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yuki M. Asano,et al.  Self-labelling via simultaneous clustering and representation learning , 2019, ICLR.

[23]  Tyler L. Hayes,et al.  REMIND Your Neural Network to Prevent Catastrophic Forgetting , 2019, European Conference on Computer Vision.

[24]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[25]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[26]  Il-Chul Moon,et al.  Dirichlet Variational Autoencoder , 2019, Pattern Recognit..

[27]  Yee Whye Teh,et al.  Continual Unsupervised Representation Learning , 2019, NeurIPS.

[28]  Patrick Pérez,et al.  Boosting Few-Shot Visual Learning With Self-Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[30]  Amos J. Storkey,et al.  Assume, Augment and Learn: Unsupervised Few-Shot Meta-Learning via Random Labels and Data Augmentation , 2019, ArXiv.

[31]  Hugo Larochelle,et al.  Centroid Networks for Few-Shot Clustering and Unsupervised Few-Shot Classification , 2019, ArXiv.

[32]  Joshua B. Tenenbaum,et al.  Infinite Mixture Prototypes for Few-Shot Learning , 2019, ICML.

[33]  Thomas L. Griffiths,et al.  Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[34]  M. Shah,et al.  Unsupervised Meta-Learning for Few-Shot Image Classification , 2018, NeurIPS.

[35]  Renjie Liao,et al.  Incremental Few-Shot Learning with Attention Attractor Networks , 2018, NeurIPS.

[36]  Sergey Levine,et al.  Unsupervised Learning via Meta-Learning , 2018, ICLR.

[37]  Nathan D. Cahill,et al.  Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[38]  Radu Horaud,et al.  DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model , 2018, ECCV.

[39]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[40]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[41]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[42]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[43]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Jiawei He,et al.  Probabilistic Video Generation using Holistic Attribute Control , 2018, ECCV.

[45]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[46]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[47]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[48]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[49]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[50]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Michael L. Mack,et al.  Dynamic updating of hippocampal object representations reflects new conceptual knowledge , 2016, Proceedings of the National Academy of Sciences.

[53]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[54]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[55]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[56]  Paulo Martins Engel,et al.  A Fast Incremental Gaussian Mixture Model , 2015, PloS one.

[57]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[58]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[59]  Erik B. Sudderth,et al.  Memoized Online Variational Inference for Dirichlet Process Mixture Models , 2013, NIPS.

[60]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[61]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  James L. McClelland,et al.  Modeling Unsupervised Perceptual Category Learning , 2008, IEEE Transactions on Autonomous Mental Development.

[63]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[64]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  Hongbin Wang,et al.  Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering , 2005, SPIE Defense + Commercial Sensing.

[66]  D. Medin,et al.  SUSTAIN: a network model of category learning. , 2004, Psychological review.

[67]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[68]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[69]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[70]  M. Pazzani,et al.  Concept formation knowledge and experience in unsupervised learning , 1991 .

[71]  John R. Anderson,et al.  The Adaptive Nature of Human Categorization , 1991 .

[72]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..