Generalizing Over Uncertain Dynamics for Online Trajectory Generation

We present an algorithm which learns an online trajectory generator that can generalize over varying and uncertain dynamics. When the dynamics is certain, the algorithm generalizes across model parameters. When the dynamics is partially observable, the algorithm generalizes across different observations. To do this, we employ recent advances in supervised imitation learning to learn a trajectory generator from a set of example trajectories computed by a trajectory optimizer. In experiments in two simulated domains, it finds solutions that are nearly as good as, and sometimes better than, those obtained by calling the trajectory optimizer on line. The online execution time is dramatically decreased, and the off-line training time is reasonable.

[1]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[2]  Karsten Berns,et al.  Modeling, Simulation and Optimization of Bipedal Walking , 2013 .

[3]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[4]  Ian R. Manchester,et al.  LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification , 2010, Int. J. Robotics Res..

[5]  J A Bagnell,et al.  An Invitation to Imitation , 2015 .

[6]  Anil V. Rao,et al.  Practical Methods for Optimal Control Using Nonlinear Programming , 1987 .

[7]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[8]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[10]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[11]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[12]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[13]  Paul T. Boggs,et al.  Sequential Quadratic Programming , 1995, Acta Numerica.

[14]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[16]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[17]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[18]  Pieter Abbeel,et al.  Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[19]  Christopher G. Atkeson,et al.  Trajectory-Based Dynamic Programming , 2013 .

[20]  J. Betts Survey of Numerical Methods for Trajectory Optimization , 1998 .

[21]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Joelle Pineau,et al.  Maximum Mean Discrepancy Imitation Learning , 2013, Robotics: Science and Systems.

[24]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[25]  Russ Tedrake,et al.  Whole-body motion planning with centroidal dynamics and full kinematics , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[26]  Russ Tedrake,et al.  A direct method for trajectory optimization of rigid bodies through contact , 2014, Int. J. Robotics Res..

[27]  Daniela Rus,et al.  Dynamics and trajectory optimization for a soft spatial fluidic elastomer manipulator , 2016, Int. J. Robotics Res..

[28]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Oskar von Stryk,et al.  Direct and indirect methods for trajectory optimization , 1992, Ann. Oper. Res..