Towards Governing Agent's Efficacy: Action-Conditional β-VAE for Deep Transparent Reinforcement Learning

We tackle the blackbox issue of deep neural networks in the settings of reinforcement learning (RL) where neural agents learn towards maximizing reward gains in an uncontrollable way. Such learning approach is risky when the interacting environment includes an expanse of state space because it is then almost impossible to foresee all unwanted outcomes and penalize them with negative rewards beforehand. Unlike reverse analysis of learned neural features from previous works, our proposed method \nj{tackles the blackbox issue by encouraging} an RL policy network to learn interpretable latent features through an implementation of a disentangled representation learning method. Toward this end, our method allows an RL agent to understand self-efficacy by distinguishing its influences from uncontrollable environmental factors, which closely resembles the way humans understand their scenes. Our experimental results show that the learned latent factors not only are interpretable, but also enable modeling the distribution of entire visited state space with a specific action condition. We have experimented that this characteristic of the proposed structure can lead to ex post facto governance for desired behaviors of RL agents.

[1]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[2]  Philippe Beaudoin,et al.  Independently Controllable Factors , 2017, ArXiv.

[3]  Honglak Lee,et al.  Contingency-Aware Exploration in Reinforcement Learning , 2018, ICLR.

[4]  Jack Stilgoe,et al.  Machine learning, social learning and the governance of self-driving cars , 2017, Social studies of science.

[5]  Joelle Pineau,et al.  Independently Controllable Features , 2017 .

[6]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Dhruv Batra,et al.  Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.

[9]  Jenna Burrell,et al.  How the machine ‘thinks’: Understanding opacity in machine learning algorithms , 2016 .

[10]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Philippe Beaudoin,et al.  Disentangling the independently controllable factors of variation by interacting with the world , 2018, ArXiv.

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[16]  B. Wynne Unruly Technology: Practical Rules, Impractical Discourses and Public Understanding , 1988 .

[17]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[20]  Iyad Rahwan,et al.  Society-in-the-loop: programming the algorithmic social contract , 2017, Ethics and Information Technology.

[21]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[22]  David D. Cox,et al.  On the information bottleneck theory of deep learning , 2018, ICLR.

[23]  Dacheng Tao,et al.  Dual Swap Disentangling , 2018, NeurIPS.

[24]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[25]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[26]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[27]  Matthew Moore,et al.  Autonomous Vehicles for Personal Transport: A Technology Assessment , 2011 .

[28]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[29]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[30]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[33]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[34]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[35]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[36]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37]  Geoffrey E. Hinton,et al.  Tensor Analyzers , 2013, ICML.

[38]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[39]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[40]  Lawrence D. Jackel,et al.  Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[41]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[42]  Charles Blundell,et al.  Early Visual Concept Learning with Unsupervised Deep Learning , 2016, ArXiv.

[43]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[44]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[45]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[46]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[47]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[48]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[51]  Max Welling,et al.  Transformation Properties of Learned Visual Representations , 2014, ICLR.

[52]  Yoshihide Sawada,et al.  Disentangling Controllable and Uncontrollable Factors of Variation by Interacting with the World , 2018, ArXiv.

[53]  Hod Lipson,et al.  Driverless: Intelligent Cars and the Road Ahead , 2016 .

[54]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[55]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[56]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.