Multimodal trajectory optimization for motion planning

Existing motion planning methods often have two drawbacks: (1) goal configurations need to be specified by a user, and (2) only a single solution is generated under a given condition. In practice, multiple possible goal configurations exist to achieve a task. Although the choice of the goal configuration significantly affects the quality of the resulting trajectory, it is not trivial for a user to specify the optimal goal configuration. In addition, the objective function used in the trajectory optimization is often non-convex, and it can have multiple solutions that achieve comparable costs. In this study, we propose a framework that determines multiple trajectories that correspond to the different modes of the cost function. We reduce the problem of identifying the modes of the cost function to that of estimating the density induced by a distribution based on the cost function. The proposed framework enables users to select a preferable solution from multiple candidate trajectories, thereby making it easier to tune the cost function and obtain a satisfactory solution. We evaluated our proposed method with motion planning tasks in 2D and 3D space. Our experiments show that the proposed algorithm is capable of determining multiple solutions for those tasks.

[1]  Michiel van de Panne,et al.  Diverse Motions and Character Shapes for Simulated Skills , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2]  Greg Turk,et al.  Learning Novel Policies For Tasks , 2019, ICML.

[3]  Masashi Sugiyama,et al.  Hierarchical Policy Search via Return-Weighted Density Estimation , 2017, AAAI.

[4]  Siddhartha S. Srinivasa,et al.  Manipulation planning with goal sets using constrained trajectory optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[6]  Stefan Schaal,et al.  Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[7]  Oliver Brock,et al.  Elastic Strips: A Framework for Motion Generation in Human Environments , 2002, Int. J. Robotics Res..

[8]  Masashi Sugiyama,et al.  Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization , 2019, ICLR.

[9]  Lydia E. Kavraki,et al.  Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[10]  Ron Alterovitz,et al.  Demonstration-Guided Motion Planning , 2011, ISRR.

[11]  Joelle Pineau,et al.  OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning , 2017, AAAI.

[12]  Darwin G. Caldwell,et al.  Kernelized movement primitives , 2017, Int. J. Robotics Res..

[13]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  H. Kappen An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[15]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[18]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[19]  Tomás Lozano-Pérez,et al.  Task-level planning of pick-and-place robot motions , 1989, Computer.

[20]  Byron Boots,et al.  Towards Robust Skill Generalization: Unifying Learning from Demonstration and Motion Planning , 2017, CoRL.

[21]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[22]  Siddhartha S. Srinivasa,et al.  CHOMP: Covariant Hamiltonian optimization for motion planning , 2013, Int. J. Robotics Res..

[23]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[24]  Oussama Khatib,et al.  Elastic bands: connecting path planning and control , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[25]  Dumitru Dumitrescu,et al.  Multimodal Optimization by Means of a Topological Species Conservation Algorithm , 2010, IEEE Transactions on Evolutionary Computation.

[26]  Jan Peters,et al.  Demonstration based trajectory optimization for generalizable robot motions , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[27]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[28]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[29]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[30]  Jan Peters,et al.  Guiding Trajectory Optimization by Demonstrated Distributions , 2017, IEEE Robotics and Automation Letters.

[31]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[32]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[33]  O. Brock,et al.  Elastic Strips: A Framework for Motion Generation in Human Environments , 2002, Int. J. Robotics Res..

[34]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[35]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[36]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[37]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[38]  Byron Boots,et al.  Continuous-time Gaussian process motion planning via probabilistic inference , 2017, Int. J. Robotics Res..

[39]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[40]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[41]  Siddhartha S. Srinivasa,et al.  Movement primitives via optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Pieter Abbeel,et al.  Motion planning with sequential convex optimization and convex collision checking , 2014, Int. J. Robotics Res..

[43]  Erik B. Sudderth Introduction to statistical machine learning , 2016 .

[44]  Lydia E. Kavraki,et al.  Analysis of probabilistic roadmaps for path planning , 1998, IEEE Trans. Robotics Autom..

[45]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[46]  Kalyanmoy Deb,et al.  Finding multiple solutions for multimodal optimization problems using a multi-objective evolutionary approach , 2010, GECCO '10.

[47]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[48]  Jan Peters,et al.  Probabilistic Movement Primitives , 2013, NIPS.

[49]  Sergey Levine,et al.  Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[50]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[51]  Steven M. LaValle,et al.  Randomized Kinodynamic Planning , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[52]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[53]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.