Analyzing the Hidden Activations of Deep Policy Networks: Why Representation Matters

We analyze the hidden activations of neural network policies of deep reinforcement learning (RL) agents and show, empirically, that it’s possible to know a-priori if a state representation will lend itself to fast learning. RL agents in high-dimensional states have two main learning burdens: (1) to learn an action-selection policy and (2) to learn to discern between useful and non-useful information in a given state. By learning a latent representation of these high-dimensional states with an auxiliary model, the latter burden is effectively removed, thereby leading to accelerated training progress. We examine this phenomenon across tasks in the PyBullet Kuka environment, where an agent must learn to control a robotic gripper to pick up an object. Our analysis reveals how neural network policies learn to organize their internal representation of the state space throughout training. The results from this analysis provide three main insights into how deep RL agents learn. First, a well-organized internal representation within the policy network is a prerequisite to learning good action-selection. Second, a poor initial representation can cause an unrecoverable collapse within a policy network. Third, a good initial representation allows an agent’s policy network to organize its internal representation even before any training begins.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Sergey Levine,et al.  Learning Predictive Models From Observation and Interaction , 2019, ECCV.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[8]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[9]  M. Richman,et al.  Euclidean Distance as a Similarity Metric for Principal Component Analysis , 2001 .

[10]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sergey Levine,et al.  Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[13]  Hong Yu,et al.  Learning Latent Space Representations to Predict Patient Outcomes: Model Development and Validation , 2019, Journal of medical Internet research.

[14]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[17]  Roshan Singh,et al.  Robotic Hand-Eye System Using Machine Learning , 2019 .

[18]  Yuichiro Yoshikawa,et al.  Intrinsically motivated reinforcement learning for human-robot interaction in the real-world , 2018, Neural Networks.

[19]  Hugo Larochelle,et al.  Correlational Neural Networks , 2015, Neural Computation.

[20]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[21]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[22]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[23]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[24]  Karol Hausman,et al.  Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning , 2020, Robotics: Science and Systems.

[25]  Yehuda Koren,et al.  Visualization of labeled data using linear transformations , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[26]  Yoshua Bengio,et al.  Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[27]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[29]  Cheng Zhang,et al.  A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation , 2020, International Journal of Intelligent Robotics and Applications.

[30]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[31]  Roy Fox,et al.  Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[35]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[37]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[38]  Shahram Yousefi,et al.  Adaptive predictive control of a differential drive robot tuned with reinforcement learning , 2019 .

[39]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.