Learning Task-Agnostic Action Spaces for Movement Optimization

We propose a novel method for exploring the dynamics of physically based animated characters, and learning a task-agnostic action space that makes movement optimization easier. Like several previous papers, we parameterize actions as target states, and learn a short-horizon goal-conditioned low-level control policy that drives the agent's state towards the targets. Our novel contribution is that with our exploration data, we are able to learn the low-level policy in a generic manner and without any reference movement data. Trained once for each agent or simulation environment, the policy improves the efficiency of optimizing both trajectories and high-level policies across multiple tasks and optimization algorithms. We also contribute novel visualizations that show how using target states as actions makes optimized trajectories more robust to disturbances; this manifests as wider optima that are easy to find. Due to its simplicity and generality, our proposed approach should provide a building block that can improve a large variety of movement optimization methods and applications.

[1]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[2]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[3]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[4]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[7]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[8]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[9]  Philippe Beaudoin,et al.  Continuation methods for adapting simulated skills , 2008, ACM Trans. Graph..

[10]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[13]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[14]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[15]  Laurent Orseau,et al.  Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.

[16]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[17]  Jaakko Lehtinen,et al.  Online motion synthesis using sequential Monte Carlo , 2014, ACM Trans. Graph..

[18]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[19]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[22]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[24]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[25]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[26]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[27]  Peter Stone,et al.  Intrinsically motivated model learning for developing curious robots , 2017, Artif. Intell..

[28]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[29]  James Davidson,et al.  TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[30]  Perttu Hämäläinen,et al.  Augmenting sampling based controllers with machine learning , 2017, Symposium on Computer Animation.

[31]  Kourosh Naderi,et al.  Discovering and synthesizing humanoid climbing movements , 2017, ACM Trans. Graph..

[32]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[33]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[34]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[35]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[36]  J. Hodgins,et al.  Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning , 2017, ACM Trans. Graph..

[37]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[38]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[39]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[40]  Kourosh Naderi,et al.  Learning Physically Based Humanoid Climbing Movements , 2018, Comput. Graph. Forum.

[41]  Kourosh Naderi,et al.  Intelligent Middle-Level Game Control , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[42]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[43]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[44]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[45]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[46]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[47]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[48]  N. Heess,et al.  Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[49]  J. Forbes,et al.  DReCon: data-driven responsive control of physics-based characters , 2019, ACM Trans. Graph..

[50]  Kyoungmin Lee,et al.  Scalable muscle-actuated human simulation and control , 2019, ACM Trans. Graph..

[51]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[52]  Alexander Ilin,et al.  Regularizing Model-Based Planning with Energy-Based Models , 2019, CoRL.

[53]  Kourosh Naderi,et al.  Self-Imitation Learning of Locomotion Movements through Termination Curriculum , 2019, MIG.

[54]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[55]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[56]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[57]  Sunmin Lee,et al.  Learning predict-and-simulate policies from unorganized human motion data , 2019, ACM Trans. Graph..

[58]  Joose Rajamäki,et al.  Continuous Control Monte Carlo Tree Search Informed by Multiple Experts , 2019, IEEE Transactions on Visualization and Computer Graphics.

[59]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[60]  Sergey Levine,et al.  Learning Predictive Models From Observation and Interaction , 2019, ECCV.

[61]  S. Levine,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, Robotics: Science and Systems.

[62]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[63]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[64]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[65]  Fabio Pardo,et al.  Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking , 2020, ArXiv.

[66]  Perttu Hämäläinen,et al.  Deep Residual Mixture Models , 2020, ArXiv.

[67]  Trista Pei-chun Chen,et al.  CARL , 2020, ACM Trans. Graph..

[68]  Jaakko Lehtinen,et al.  PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation , 2018, 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP).

[69]  Jungdam Won,et al.  A scalable approach to control diverse behaviors for physically simulated characters , 2020, ACM Trans. Graph..

[70]  Raia Hadsell,et al.  CoMic: Complementary Task Learning & Mimicry for Reusable Skills , 2020, ICML.

[71]  Yuval Tassa,et al.  Deep neuroethology of a virtual rodent , 2019, ICLR.

[72]  Sergey Levine,et al.  Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.

[73]  C. Karen Liu,et al.  Visualizing Movement Control Optimization Landscapes , 2019, IEEE Transactions on Visualization and Computer Graphics.