Composing Ensembles of Policies with Deep Reinforcement Learning

Composition of elementary skills into complex behaviors to solve challenging problems is one of the key elements toward building intelligent machines. To date, there has been plenty of work on learning new policies or skills but almost no focus on composing them to perform complex decision-making. In this paper, we propose a policy ensemble composition framework that takes the robot's primitive policies and learns to compose them concurrently or sequentially through reinforcement learning. We evaluate our method in problems where traditional approaches either fail or exhibit high sample complexity to find a solution. We show that our method not only solves the problems that require both task and motion planning but also exhibits high data efficiency, which is currently one of the main limitations of reinforcement learning.

[1]  G. Rizzolatti,et al.  Neurophysiological mechanisms underlying the understanding and imitation of action , 2001, Nature Reviews Neuroscience.

[2]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[3]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[4]  Shigeki Sugano,et al.  Imitating others by composition of primitive actions: A neuro-dynamic model , 2012, Robotics Auton. Syst..

[5]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[6]  Leslie Pack Kaelbling,et al.  Learning composable models of parameterized skills , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Jun Morimoto,et al.  Learning parametric dynamic movement primitives from multiple demonstrations , 2011, Neural Networks.

[8]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[9]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[12]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[13]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[14]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[15]  Ales Ude,et al.  Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[18]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[19]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[20]  Emanuel Todorov,et al.  Compositionality of optimal control laws , 2009, NIPS.

[21]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[22]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[23]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[24]  Jan Peters,et al.  Probabilistic Movement Primitives , 2013, NIPS.

[25]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[26]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[27]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[28]  Saurabh Kumar,et al.  Learning to Compose Skills , 2017, ArXiv.

[29]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[30]  Yuichiro Yoshikawa,et al.  Intrinsically motivated reinforcement learning for human-robot interaction in the real-world , 2018, Neural Networks.

[31]  Manuela M. Veloso,et al.  Teaching sequential tasks with repetition through demonstration , 2008, AAMAS.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[34]  Yuichiro Yoshikawa,et al.  Show, attend and interact: Perceivable human-robot social interaction through neural attention Q-network , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Rüdiger Dillmann,et al.  Towards Cognitive Robots: Building Hierarchical Task Representations of Manipulations from Human Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[36]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[37]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[38]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[39]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[41]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.