论文信息 - Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning. In such hierarchical structures, a higher-level controller solves tasks by iteratively communicating goals which a lower-level policy is trained to reach. Accordingly, the choice of representation -- the mapping of observation space to goal space -- is crucial. To study this problem, we develop a notion of sub-optimality of a representation, defined in terms of expected reward of the optimal hierarchical policy using this representation. We derive expressions which bound the sub-optimality and show how these expressions can be translated to representation learning objectives which may be optimized in practice. Results on a number of difficult continuous-control tasks show that our approach to representation learning yields qualitatively better representations as well as quantitatively better hierarchical policies, compared to existing methods (see videos at this https URL).

[1] Satinder P. Singh,et al. Transfer via soft homomorphisms , 2009, AAMAS.

[2] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[3] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[4] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[5] Aaron C. Courville,et al. MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[6] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[7] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[9] Balaraman Ravindran,et al. Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[10] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[11] Andrew G. Barto,et al. Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[12] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[13] James R. Evans,et al. Aggregation and Disaggregation Techniques and Methodology in Optimization , 1991, Oper. Res..

[14] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[15] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[16] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[18] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[19] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[20] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[21] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[22] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[23] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[24] Charles Blundell,et al. Early Visual Concept Learning with Unsupervised Deep Learning , 2016, ArXiv.

[25] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[26] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[27] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.

[28] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[29] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[30] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[31] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[32] Jürgen Schmidhuber,et al. Planning simple trajectories using neural subgoal generators , 1993 .

[33] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[34] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.