Learning Task-Agnostic Action Spaces for Movement Optimization

We propose a novel method for exploring the dynamics of physically based animated characters, and learning a task-agnostic action space that makes movement optimization easier. Like several previous papers, we parameterize actions as target states, and learn a short-horizon goal-conditioned low-level control policy that drives the agent's state towards the targets. Our novel contribution is that with our exploration data, we are able to learn the low-level policy in a generic manner and without any reference movement data. Trained once for each agent or simulation environment, the policy improves the efficiency of optimizing both trajectories and high-level policies across multiple tasks and optimization algorithms. We also contribute novel visualizations that show how using target states as actions makes optimized trajectories more robust to disturbances; this manifests as wider optima that are easy to find. Due to its simplicity and generality, our proposed approach should provide a building block that can improve a large variety of movement optimization methods and applications.

[1]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[2]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[3]  George Tucker,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[4]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Peter Stone,et al.  Intrinsically motivated model learning for developing curious robots , 2017, Artif. Intell..

[7]  Jaakko Lehtinen,et al.  PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation , 2018, 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP).

[8]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[9]  Jungdam Won,et al.  A scalable approach to control diverse behaviors for physically simulated characters , 2020, ACM Trans. Graph..

[10]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[11]  Philippe Beaudoin,et al.  Continuation methods for adapting simulated skills , 2008, ACM Trans. Graph..

[12]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[13]  Alexander Ilin,et al.  Regularizing Model-Based Planning with Energy-Based Models , 2019, CoRL.

[14]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[15]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[16]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[18]  Kourosh Naderi,et al.  Discovering and synthesizing humanoid climbing movements , 2017, ACM Trans. Graph..

[19]  Yuval Tassa,et al.  Deep neuroethology of a virtual rodent , 2019, ICLR.

[20]  Daniel Holden,et al.  DReCon , 2019, ACM Trans. Graph..

[21]  Kourosh Naderi,et al.  Learning Physically Based Humanoid Climbing Movements , 2018, Comput. Graph. Forum.

[22]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[23]  Sergey Levine,et al.  Learning Predictive Models From Observation and Interaction , 2019, ECCV.

[24]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[26]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[27]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[28]  Baining Guo,et al.  Terrain runner , 2012, ACM Trans. Graph..

[29]  Laurent Orseau,et al.  Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[32]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[33]  Abhishek Gupta,et al.  Learning to Reach Goals via Iterated Supervised Learning. , 2019 .

[34]  Kourosh Naderi,et al.  Intelligent Middle-Level Game Control , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[35]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[37]  Kourosh Naderi,et al.  Self-Imitation Learning of Locomotion Movements through Termination Curriculum , 2019, MIG.

[38]  Perttu Hämäläinen,et al.  Deep Residual Mixture Models , 2020, ArXiv.

[39]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[40]  C. Karen Liu,et al.  Visualizing Movement Control Optimization Landscapes , 2019, IEEE Transactions on Visualization and Computer Graphics.

[41]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[42]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[43]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[44]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[45]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[46]  Jaakko Lehtinen,et al.  Online motion synthesis using sequential Monte Carlo , 2014, ACM Trans. Graph..

[47]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[48]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[49]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[50]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[51]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[52]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[53]  James Davidson,et al.  TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[54]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[55]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[56]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[57]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..