Independently Controllable Factors

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[3]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[4]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[5]  A. Gopnik,et al.  Reconstructing constructivism: causal models, Bayesian learning mechanisms, and the theory theory. , 2012, Psychological bulletin.

[6]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[7]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[8]  Aapo Hyvärinen,et al.  Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA , 2016, NIPS.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11]  Alois Knoll,et al.  Complex Valued Artificial Recurrent Neural Network as a Novel Approach to Model the Perceptual Binding Problem , 2012, ESANN.

[12]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13]  Harri Valpola,et al.  Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.

[14]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[17]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[18]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[19]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[20]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[21]  Doina Precup,et al.  Using MDP Characteristics to Guide Exploration in Reinforcement Learning , 2003, ECML.

[22]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[23]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[24]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[25]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[26]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[27]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..