论文信息 - Learning Task-Agnostic Action Spaces for Movement Optimization

Learning Task-Agnostic Action Spaces for Movement Optimization

We propose a novel method for exploring the dynamics of physically based animated characters, and learning a task-agnostic action space that makes movement optimization easier. Like several previous papers, we parameterize actions as target states, and learn a short-horizon goal-conditioned low-level control policy that drives the agent's state towards the targets. Our novel contribution is that with our exploration data, we are able to learn the low-level policy in a generic manner and without any reference movement data. Trained once for each agent or simulation environment, the policy improves the efficiency of optimizing both trajectories and high-level policies across multiple tasks and optimization algorithms. We also contribute novel visualizations that show how using target states as actions makes optimized trajectories more robust to disturbances; this manifests as wider optima that are easy to find. Due to its simplicity and generality, our proposed approach should provide a building block that can improve a large variety of movement optimization methods and applications.

[1] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[2] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[3] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[4] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[7] KangKang Yin,et al. SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[8] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[9] Philippe Beaudoin,et al. Continuation methods for adapting simulated skills , 2008, ACM Trans. Graph..

[10] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Zoran Popovic,et al. Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[13] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[14] Michiel van de Panne,et al. Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[15] Laurent Orseau,et al. Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.

[16] Emanuel Todorov,et al. Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[17] Jaakko Lehtinen,et al. Online motion synthesis using sequential Monte Carlo , 2014, ACM Trans. Graph..

[18] Glen Berseth,et al. Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[21] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[22] Sergey Levine,et al. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23] Glen Berseth,et al. Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..