Unsupervised State Representation Learning in Atari

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks. Learning such representations without supervision from rewards is a challenging open problem. We introduce a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations. We also introduce a new benchmark based on Atari 2600 games where we evaluate representations based on how well they capture the ground truth state variables. We believe this new framework for evaluating representation learning models will be crucial for future representation learning research. Finally, we compare our technique with other state-of-the-art generative and contrastive representation learning methods. The code associated with this work is available at this https URL

[1]  J. Konorski Integrative activity of the brain : an interdisciplinary approach , 1967 .

[2]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[3]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[4]  David E. Irwin,et al.  What’s in an object file? Evidence from priming studies , 1996, Perception & psychophysics.

[5]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[6]  Naftali Tishby,et al.  Predictive Information , 1999, cond-mat/9902341.

[7]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[8]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[9]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  Laurie N. Taylor,et al.  Playing the Past: History and Nostalgia in Video Games , 2008 .

[12]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[13]  Erik Champion,et al.  Playing with the Past , 2010, Human-Computer Interaction Series.

[14]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[15]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[21]  Michael J. Berry,et al.  Predictive information in a sensory population , 2013, Proceedings of the National Academy of Sciences.

[22]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[25]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[26]  Jan Peters,et al.  Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[28]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[29]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[30]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[33]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[34]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[35]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[36]  Wuyang Duan,et al.  Learning state representations for robotic control: Information disentangling and multi-modal learning , 2017 .

[37]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[38]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Philippe Beaudoin,et al.  Independently Controllable Factors , 2017, ArXiv.

[40]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[41]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[42]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[43]  Raquel Urtasun,et al.  Few-Shot Learning Through an Information Retrieval Lens , 2017, NIPS.

[44]  Aapo Hyvärinen,et al.  Nonlinear ICA of Temporally Dependent Stationary Sources , 2017, AISTATS.

[45]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[46]  Martin A. Riedmiller,et al.  PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations , 2017, ArXiv.

[47]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[48]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[49]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Yoshua Bengio,et al.  On the Learning Dynamics of Deep Neural Networks , 2018, ArXiv.

[51]  David Pfau,et al.  Towards a Definition of Disentangled Representations , 2018, ArXiv.

[52]  Joelle Pineau,et al.  Natural Environment Benchmarks for Reinforcement Learning , 2018, ArXiv.

[53]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[54]  Douwe Kiela,et al.  SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[55]  Chongjie Zhang,et al.  Object-Oriented Dynamics Predictor , 2018, NeurIPS.

[56]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[57]  Zhuang Ma,et al.  Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency , 2018, EMNLP.

[58]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[59]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[60]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[61]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[62]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[63]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[64]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[65]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[66]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[67]  Sergey Levine,et al.  EMI: Exploration with Mutual Information , 2018, ICML.

[68]  Julian Togelius,et al.  Playing Atari with Six Neurons , 2018, AAMAS.

[69]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[70]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[71]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[72]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[73]  Marc G. Bellemare,et al.  An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.

[74]  Alexander Kolesnikov,et al.  Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Sergey Levine,et al.  Wasserstein Dependency Measure for Representation Learning , 2019, NeurIPS.

[76]  Honglak Lee,et al.  Contingency-Aware Exploration in Reinforcement Learning , 2018, ICLR.

[77]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[78]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[79]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[80]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[81]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  Karl Stratos,et al.  Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.