论文信息 - Learning Task-Agnostic Action Spaces for Movement Optimization

Learning Task-Agnostic Action Spaces for Movement Optimization

We propose a novel method for exploring the dynamics of physically based animated characters, and learning a task-agnostic action space that makes movement optimization easier. Like several previous papers, we parameterize actions as target states, and learn a short-horizon goal-conditioned low-level control policy that drives the agent's state towards the targets. Our novel contribution is that with our exploration data, we are able to learn the low-level policy in a generic manner and without any reference movement data. Trained once for each agent or simulation environment, the policy improves the efficiency of optimizing both trajectories and high-level policies across multiple tasks and optimization algorithms. We also contribute novel visualizations that show how using target states as actions makes optimized trajectories more robust to disturbances; this manifests as wider optima that are easy to find. Due to its simplicity and generality, our proposed approach should provide a building block that can improve a large variety of movement optimization methods and applications.

C. Karen Liu | Perttu Hämäläinen | Michiel van de Panne | Amin Babadi

[1] Zoran Popovic,et al. Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[2] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[3] George Tucker,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[4] Glen Berseth,et al. DeepLoco , 2017, ACM Trans. Graph..

[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6] Peter Stone,et al. Intrinsically motivated model learning for developing curious robots , 2017, Artif. Intell..

[7] Jaakko Lehtinen,et al. PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation , 2018, 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP).

[8] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[9] Jungdam Won,et al. A scalable approach to control diverse behaviors for physically simulated characters , 2020, ACM Trans. Graph..

[10] Glen Berseth,et al. Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[11] Philippe Beaudoin,et al. Continuation methods for adapting simulated skills , 2008, ACM Trans. Graph..

[12] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[13] Alexander Ilin,et al. Regularizing Model-Based Planning with Energy-Based Models , 2019, CoRL.

[14] Michiel van de Panne,et al. Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[15] Marwan Mattar,et al. Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[16] Sergey Levine,et al. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[18] Kourosh Naderi,et al. Discovering and synthesizing humanoid climbing movements , 2017, ACM Trans. Graph..

[19] Yuval Tassa,et al. Deep neuroethology of a virtual rodent , 2019, ICLR.

[20] Daniel Holden,et al. DReCon , 2019, ACM Trans. Graph..

[21] Kourosh Naderi,et al. Learning Physically Based Humanoid Climbing Movements , 2018, Comput. Graph. Forum.

[22] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[23] Sergey Levine,et al. Learning Predictive Models From Observation and Interaction , 2019, ECCV.

[24] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[26] Glen Berseth,et al. Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[27] Jie Tan,et al. Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[28] Baining Guo,et al. Terrain runner , 2012, ACM Trans. Graph..

[29] Laurent Orseau,et al. Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.

[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.

[32] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[33] Abhishek Gupta,et al. Learning to Reach Goals via Iterated Supervised Learning. , 2019 .

[34] Kourosh Naderi,et al. Intelligent Middle-Level Game Control , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[35] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36] Sergey Levine,et al. MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[37] Kourosh Naderi,et al. Self-Imitation Learning of Locomotion Movements through Termination Curriculum , 2019, MIG.

[38] Perttu Hämäläinen,et al. Deep Residual Mixture Models , 2020, ArXiv.

[39] KangKang Yin,et al. SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[40] C. Karen Liu,et al. Visualizing Movement Control Optimization Landscapes , 2019, IEEE Transactions on Visualization and Computer Graphics.

[41] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.

[42] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[43] Michiel van de Panne,et al. Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[44] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[45] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[46] Jaakko Lehtinen,et al. Online motion synthesis using sequential Monte Carlo , 2014, ACM Trans. Graph..

[47] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[48] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[49] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[50] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[51] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[52] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[53] James Davidson,et al. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[54] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..

[55] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[56] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[57] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..