Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics

Scaling end-to-end reinforcement learning to control real robots from vision presents a series of challenges, in particular in terms of sample efficiency. Against end-to-end learning, state representation learning can help learn a compact, efficient and relevant representation of states that speeds up policy learning, reducing the number of samples needed, and that is easier to interpret. We evaluate several state representation learning methods on goal based robotics tasks and propose a new unsupervised model that stacks representations and combines strengths of several of these approaches. This method encodes all the relevant features, performs on par or better than end-to-end learning with better sample efficiency, and is robust to hyper-parameters change.

[1]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[2]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[3]  Klaus Obermayer,et al.  Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations , 2015, KI - Künstliche Intelligenz.

[4]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[5]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[8]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[9]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[10]  David W. Aha,et al.  Dimensionality Reduced Reinforcement Learning for Assistive Robots , 2016, AAAI Fall Symposia.

[11]  Robert Babuska,et al.  Learning state representation for deep actor-critic control , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[12]  Martin A. Riedmiller,et al.  Learn to Swing Up and Balance a Real Pole Based on Raw Visual Input Data , 2012, ICONIP.

[13]  David Filliat,et al.  S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning , 2018, ArXiv.

[14]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[15]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[16]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[17]  Sergey Levine,et al.  Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders , 2015, ArXiv.

[18]  David Filliat,et al.  Unsupervised state representation learning with robotic priors: a robustness benchmark , 2017, ArXiv.

[19]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[20]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[21]  Martin A. Riedmiller,et al.  PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations , 2017, ArXiv.

[22]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.