Multi-armed bandit models for 2D grasp planning with uncertainty

For applications such as warehouse order fulfillment, robot grasps must be robust to uncertainty arising from sensing, mechanics, and control. One way to achieve robustness is to evaluate the performance of candidate grasps by sampling perturbations in shape, pose, and gripper approach and to compute the probability of force closure for each candidate to identify a grasp with the highest expected quality. Since evaluating the quality of each grasp is computationally demanding, prior work has turned to cloud computing. To improve computational efficiency and to extend this work, we consider how Multi-Armed Bandit (MAB) models for optimizing decisions can be applied in this context. We formulate robust grasp planning as a MAB problem and evaluate convergence times towards an optimal grasp candidate using 100 object shapes from the Brown Vision 2D Lab Dataset with 1000 grasp candidates per object. We consider the case where shape uncertainty is represented as a Gaussian process implicit surface (GPIS) with Gaussian uncertainty in pose, gripper approach angle, and coefficient of friction. We find that Thompson Sampling and the Gittins index MAB methods converged to within 3% of the optimal grasp up to 10x faster than uniform allocation and 5x faster than iterative pruning.

[1]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[2]  F. Kelly Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .

[3]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[4]  Benjamin W. Mooring,et al.  Determination and specification of robot repeatability , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[5]  S. Shankar Sastry,et al.  Task-oriented optimal grasping by multifingered robot hands , 1987, IEEE J. Robotics Autom..

[6]  R. Simon,et al.  Optimal two-stage designs for phase II clinical trials. , 1989, Controlled clinical trials.

[7]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[8]  Kenneth Y. Goldberg,et al.  Bayesian grasping , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[9]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[10]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[11]  John F. Canny,et al.  Planning optimal grasps , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[12]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[13]  K. Glazebrook,et al.  Index-based policies for discounted multi-armed bandits on parallel machines , 2000 .

[14]  Xin Wang,et al.  On quality functions for grasp synthesis, fixture planning, and coordinated manipulation , 2004, IEEE Transactions on Automation Science and Engineering.

[15]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[16]  Gildardo Sánchez-Ante,et al.  Hybrid PRM Sampling with a Cost-Sensitive Adaptive Strategy , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Yu Zheng,et al.  Coping with the Grasping Uncertainties in Force-closure Analysis , 2005, Int. J. Robotics Res..

[19]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[20]  Paul R. Schrater,et al.  Handling shape and contact location uncertainty in grasping two-dimensional planar objects , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Matei T. Ciocarlie,et al.  Hand Posture Subspaces for Dexterous Robotic Grasping , 2009, Int. J. Robotics Res..

[23]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[24]  A. Frank van der Stappen,et al.  Output-Sensitive Computation of Force-Closure Grasps of a Semi-Algebraic Object , 2011, IEEE Transactions on Automation Science and Engineering.

[25]  Peter Brook,et al.  Bayesian Grasp Planning , 2011 .

[26]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[27]  Matei T. Ciocarlie,et al.  Collaborative grasp planning with multiple object representations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[29]  Stefan Schaal,et al.  Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.

[30]  Olivier Teytaud,et al.  Online Sparse Bandit for Card Games , 2011, ACG.

[31]  Marc Toussaint,et al.  Gaussian process implicit surfaces for shape estimation and grasping , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32]  Dmitry Berenson,et al.  Toward cloud-based grasping with uncertainty in shape: Estimating lower bounds on achieving force closure with zero-slip push grasps , 2012, 2012 IEEE International Conference on Robotics and Automation.

[33]  James J. Kuffner,et al.  Physically-based grasp quality evaluation under uncertainty , 2012, 2012 IEEE International Conference on Robotics and Automation.

[34]  Dmitry Berenson,et al.  Estimating part tolerance bounds based on adaptive Cloud-based grasp planning with slip , 2012, 2012 IEEE International Conference on Automation Science and Engineering (CASE).

[35]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[36]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[37]  Philip Bachman,et al.  Greedy Confidence Pursuit: A Pragmatic Approach to Multi-bandit Optimization , 2013, ECML/PKDD.

[38]  Marc Toussaint,et al.  Uncertainty aware grasping and tactile exploration , 2013, 2013 IEEE International Conference on Robotics and Automation.

[39]  Geoffrey A. Hollinger,et al.  Active planning for underwater inspection and the benefit of adaptivity , 2012, Int. J. Robotics Res..

[40]  Martial Hebert,et al.  Multi-armed recommendation bandits for selecting state machine policies for robotic systems , 2013, 2013 IEEE International Conference on Robotics and Automation.

[41]  Danica Kragic,et al.  Friction coefficients and grasp synthesis , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  James J. Kuffner,et al.  Physically Based Grasp Quality Evaluation Under Pose Uncertainty , 2013, IEEE Transactions on Robotics.

[43]  Risto Ritala,et al.  Optimal sensing via multi-armed bandit relaxations in mixed observability domains , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Matei T. Ciocarlie,et al.  GP-GPIS-OPT: Grasp planning with shape uncertainty using Gaussian process implicit surfaces and Sequential Convex Programming , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Pieter Abbeel,et al.  Image Object Label 3 D CAD Model Candidate Grasps Google Object Recognition Engine Google Cloud Storage Select Feasible Grasp with Highest Success Probability Pose EstimationCamera Robots Cloud 3 D Sensor , 2014 .