Open-Loop Planning in Large-Scale Stochastic Domains

We focus on effective sample-based planning in the face of underactuation, high-dimensionality, drift, discrete system changes, and stochasticity. These are hallmark challenges for important problems, such as humanoid locomotion. In order to ensure broad applicability, we assume domain expertise is minimal and limited to a generative model. In order to make the method responsive, computational costs that scale linearly with the amount of samples taken from the generative model are required. We bring to bear a concrete method that satisfies all these requirements; it is a receding-horizon open-loop planner that employs cross-entropy optimization for policy construction. In simulation, we empirically demonstrate near-optimal decisions in a small domain and effective locomotion in several challenging humanoid control tasks.

[1]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[2]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[3]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[4]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[5]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[6]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[7]  Lydia E. Kavraki,et al.  Motion Planning in the Presence of Drift, Underactuation and Discrete System Changes , 2005, Robotics: Science and Systems.

[8]  L. Margolin,et al.  On the Convergence of the Cross-Entropy Method , 2005, Ann. Oper. Res..

[9]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[10]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[11]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[12]  Gareth E. Evans,et al.  Parallel cross-entropy optimization , 2007, 2007 Winter Simulation Conference.

[13]  William D. Smart,et al.  Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[14]  Dirk P. Kroese,et al.  Convergence properties of the cross-entropy method for discrete optimization , 2007, Oper. Res. Lett..

[15]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[16]  Ian R. Manchester,et al.  Stable Dynamic Walking over Rough Terrain - Theory and Experiment , 2009, ISRR.

[17]  K. Wampler,et al.  Optimal gait and form for animal locomotion , 2009, SIGGRAPH 2009.

[18]  Yuval Tassa,et al.  Stochastic Complementarity for Local Control of Discontinuous Dynamics , 2010, Robotics: Science and Systems.

[19]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[20]  William D. Smart,et al.  Optimal control for autonomous motor behavior , 2011 .

[21]  Yuval Tassa,et al.  Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts , 2011, Robotics: Science and Systems.

[22]  Marin Kobilarov,et al.  Cross-Entropy Randomized Motion Planning , 2011, Robotics: Science and Systems.

[23]  Marin Kobilarov,et al.  Cross-entropy motion planning , 2012, Int. J. Robotics Res..

[24]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[25]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.