Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies

Intelligent behaviour in the real-world requires the ability to acquire new knowledge from an ongoing sequence of experiences while preserving and reusing past knowledge. We propose a novel algorithm for unsupervised representation learning from piece-wise stationary visual data: Variational Autoencoder with Shared Embeddings (VASE). Based on the Minimum Description Length principle, VASE automatically detects shifts in the data distribution and allocates spare representational capacity to new knowledge, while simultaneously protecting previously learnt representations from catastrophic forgetting. Our approach encourages the learnt representations to be disentangled, which imparts a number of desirable properties: VASE can deal sensibly with ambiguous inputs, it can enhance its own representations through imagination-based exploration, and most importantly, it exhibits semantically meaningful sharing of latents between different datasets. Compared to baselines with entangled representations, our approach is able to reason beyond surface-level statistics and perform semantically meaningful cross-domain inference.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[3]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[4]  W. Gan,et al.  Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity , 2015, Nature.

[5]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[6]  Alexandros Kalousis,et al.  Lifelong Generative Modeling , 2017, Neurocomputing.

[7]  Stefano Soatto,et al.  Emergence of invariance and disentangling in deep representations , 2017 .

[8]  Serge J. Belongie,et al.  Bayesian representation learning with oracle constraints , 2015, ICLR 2016.

[9]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[10]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[11]  A. Przybyszewski,et al.  Vision: Does top-down processing help us to see? , 1998, Current Biology.

[12]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[15]  Yan Liu,et al.  Neural selectivity in anterior inferotemporal cortex for morphed photographic images during behavioral classification or fixation. , 2008, Journal of Neurophysiology.

[16]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[17]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[18]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[19]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[23]  L’oubli catastrophique it,et al.  Avoiding catastrophic forgetting by coupling two reverberating neural networks , 2004 .

[24]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[25]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[26]  Laurent Itti,et al.  Active Long Term Memory Networks , 2016, ArXiv.

[27]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[28]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[29]  Han Liu,et al.  Continual Learning in Generative Adversarial Nets , 2017, ArXiv.

[30]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[31]  Stefano Soatto,et al.  A Separation Principle for Control in the Age of Deep Learning , 2017, Annual Review of Control, Robotics, and Autonomous Systems.

[32]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[33]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[36]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[37]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[38]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[39]  John J. Magee,et al.  Categorical perception of facial expressions , 1992, Cognition.

[40]  Y. Niv,et al.  Reconsolidation-Extinction Interactions in Fear Memory Attenuation: The Role of Inter-Trial Interval Variability , 2017, Frontiers in behavioral neuroscience.

[41]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[42]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[43]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[44]  Per E. Roland,et al.  Functional Organisation of the Human Visual Cortex , 1993 .

[45]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[47]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[49]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[50]  Joel Veness,et al.  The Forget-me-not Process , 2016, NIPS.

[51]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[52]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[53]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[54]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[55]  L. A. N. Esq.,et al.  LXI. Observations on some remarkable optical phænomena seen in Switzerland; and on an optical phænomenon which occurs on viewing a figure of a crystal or geometrical solid , 1832 .