POMDP manipulation via trajectory optimization

Efficient object manipulation based only on force feedback typically requires a plan of actively contact-seeking actions to reduce uncertainty over the true environmental model. In principle, that problem could be formulated as a full partially observable Markov decision process (POMDP) whose observations are sensed forces indicating the presence/absence of contacts with objects. Such a naive application leads to a very large POMDP with high-dimensional continuous state, action and observation spaces. Solving such large POMDPs is practically prohibitive. In other words, we are facing three challenging problems: 1) uncertainty over discontinuous contacts with objects; 2) high-dimensional continuous spaces; 3) optimization for not only trajectory cost but also execution time. As trajectory optimization is a powerful model-based method for motion generation, it can handle the last two issues effectively by computing locally optimal trajectories. This paper aims to integrate advantages of trajectory optimization into existing POMDP solvers. The full POMDP formulation is solved using sample-based approaches, where each sampled model is quickly evaluated via trajectory optimization instead of simulating a large number of rollouts. To further accelerate the solver, we propose to integrate temporal abstraction, i.e. macro actions or temporal actions, into the POMDP model. We demonstrate the proposed method on a simulated 7 DoF KUKA arm and a physical Willow Garage PR2 platform. The results show that our proposed method could effectively seek contacts in complex scenarios, and achieve near-optimal performance of path planing.

[1]  Marc Toussaint,et al.  Hierarchical Monte-Carlo Planning , 2015, AAAI.

[2]  Leslie Pack Kaelbling,et al.  Robust grasping under object pose uncertainty , 2011, Auton. Robots.

[3]  Peter Englert,et al.  Dual execution of optimized contact interaction trajectories , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Wolfgang Ertel,et al.  Monte carlo bayesian hierarchical reinforcement learning , 2014, AAMAS.

[5]  Russ Tedrake,et al.  Direct Trajectory Optimization of Rigid Body Dynamical Systems through Contact , 2012, WAFR.

[6]  Oliver Brock,et al.  Exploitation of environmental constraints in human and robotic grasping , 2015, Int. J. Robotics Res..

[7]  David Hsu,et al.  Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.

[8]  Emanuel Todorov,et al.  A convex, smooth and invertible contact model for trajectory optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9]  David Hsu,et al.  Planning how to learn , 2013, 2013 IEEE International Conference on Robotics and Automation.

[10]  Sungyoung Lee,et al.  Approximate planning for bayesian hierarchical reinforcement learning , 2014, Applied Intelligence.

[11]  Leslie Pack Kaelbling,et al.  Grasping POMDPs , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[12]  Viet-Hung Dang,et al.  Monte-Carlo tree search for Bayesian reinforcement learning , 2012, 2012 11th International Conference on Machine Learning and Applications.

[13]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[14]  Gaurav S. Sukhatme,et al.  An autonomous manipulation system based on force control and optimization , 2014, Auton. Robots.

[15]  Emanuel Todorov,et al.  Trajectory optimization for domains with contacts using inverse dynamics , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[17]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[18]  Joelle Pineau,et al.  Tractable planning under uncertainty: exploiting structure , 2004 .

[19]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[20]  Ron Alterovitz,et al.  Motion planning under uncertainty using iterative local optimization in belief space , 2012, Int. J. Robotics Res..

[21]  Weiwei Li,et al.  An Iterative Optimal Control and Estimation Design for Nonlinear Stochastic System , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[22]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[23]  Zoran Popovic,et al.  Contact-invariant optimization for hand manipulation , 2012, SCA '12.

[24]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[25]  Larry H. Matthies,et al.  End-to-end dexterous manipulation with deliberate interactive estimation , 2012, 2012 IEEE International Conference on Robotics and Automation.

[26]  J. Andrew Bagnell,et al.  Robust Object Grasping using Force Compliant Motion Primitives , 2012, Robotics: Science and Systems.

[27]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[28]  Marc Toussaint,et al.  Newton methods for k-order Markov Constrained Motion Problems , 2014, ArXiv.

[29]  Leslie Pack Kaelbling,et al.  Belief space planning assuming maximum likelihood observations , 2010, Robotics: Science and Systems.

[30]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[31]  Siddhartha S. Srinivasa,et al.  Efficient touch based localization through submodularity , 2012, 2013 IEEE International Conference on Robotics and Automation.

[32]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.