Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train a latent pivotal state model and a curiosity-driven goal-conditioned policy in a task-agnostic manner. Second, provided with the information from the world graph, a high-level Manager quickly finds solution to new tasks and expresses subgoals in reference to pivotal states to a low-level Worker. The Worker can then also leverage the graph to easily traverse to the pivotal states of interest, even across long distance, and explore non-locally. We perform a thorough ablation study to evaluate our approach on a suite of challenging maze tasks, demonstrating significant advantages from the proposed framework over baselines that lack world graph knowledge in terms of performance and efficiency.

[1]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[2]  Kenneth O. Stanley,et al.  Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[3]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[4]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[5]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[6]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[7]  Ludovic Denoyer,et al.  Options Discovery with Budgeted Reinforcement Learning , 2016, ArXiv.

[8]  Arjun K. Bansal,et al.  Hierarchical Policy Learning is Sensitive to Goal Space Design , 2019, ArXiv.

[9]  Sergey Levine,et al.  Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[10]  George Konidaris,et al.  Discovering Options for Exploration by Minimizing Cover Time , 2019, ICML.

[11]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[12]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[13]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[14]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[15]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[16]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[17]  Jason M. O'Kane,et al.  Active localization with dynamic obstacles , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[19]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[20]  Yuandong Tian,et al.  Latent forward model for Real-time Strategy game planning with incomplete information , 2018 .

[21]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[22]  Yu Zhang,et al.  Latent Sequence Decompositions , 2016, ICLR.

[23]  N. Biggs Algebraic Graph Theory: COLOURING PROBLEMS , 1974 .

[24]  Christopher G. Atkeson,et al.  Neural networks and differential dynamic programming for reinforcement learning problems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[26]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[27]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[28]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[29]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[32]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[34]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[35]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[36]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[37]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[38]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[39]  Sebastian Thrun,et al.  Learning Metric-Topological Maps for Indoor Mobile Robot Navigation , 1998, Artif. Intell..

[40]  Pushmeet Kohli,et al.  Compositional Imitation Learning: Explaining and executing one task at a time , 2018, ArXiv.

[41]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[42]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[43]  Scott Niekum,et al.  Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[44]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[45]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[46]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[47]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[48]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[49]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[50]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[51]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[52]  Max Welling,et al.  Stochastic Activation Actor Critic Methods , 2019, ECML/PKDD.

[53]  Demis Hassabis,et al.  SCAN: Learning Abstract Hierarchical Compositional Visual Concepts , 2017, ArXiv.

[54]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[55]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[56]  Bhaskara Marthi,et al.  Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.

[57]  Rémi Munos,et al.  World Discovery Models , 2019, ArXiv.

[58]  Horst Bischof,et al.  Attentive Object Detection Using an Information Theoretic Saliency Measure , 2004, WAPCV.

[59]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[60]  Karol Gregor,et al.  Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[61]  Maria Chatzigiorgaki,et al.  Real-time keyframe extraction towards video content identification , 2009, 2009 16th International Conference on Digital Signal Processing.

[62]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[63]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[64]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[65]  Oliver Kroemer,et al.  Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[66]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[67]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[68]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Wolfram Burgard,et al.  Active Markov localization for mobile robots , 1998, Robotics Auton. Syst..

[70]  Eric T. Nalisnick,et al.  Deep Generative Models with Stick-Breaking Priors , 2016 .

[71]  Marco Cote STICK-BREAKING VARIATIONAL AUTOENCODERS , 2017 .

[72]  Mohamed Medhat Gaber,et al.  Imitation Learning , 2017, ACM Comput. Surv..

[73]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[74]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[75]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[76]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[77]  Kate Saenko,et al.  Hierarchical Actor-Critic , 2017, ArXiv.

[78]  Alexei A. Efros,et al.  Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[79]  Robert Babuska,et al.  Actor-critic reinforcement learning for tracking control in robotics , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[80]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[81]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[82]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[83]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[84]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[85]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[86]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[87]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[88]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[89]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[90]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[91]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[92]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[93]  Murray Shanahan,et al.  SCAN: Learning Hierarchical Compositional Visual Concepts , 2017, ICLR.

[94]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[95]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.