Budgeted Multi-Armed Bandit Models for Sample-Based Grasp Planning in the Presence of Uncertainty

Sampling perturbations in shape, state, and control can facilitate grasp planning in the presence of uncertainty arising from noise, occlusions, and surface properties such as transparency and specularities. Monte-Carlo sampling is computationally demanding, even for planar models. We consider an alternative based on the multi-armed bandit (MAB) model for making sequential decisions, which can apply to a variety of uncertainty models. We formulate grasp planning as a “budgeted multi-armed bandit model” (BMAB) with finite stopping time to minimize “simple regret”, the difference between the expected quality of the best grasp and the expected quality of the grasp evaluated at the stopping time. To evaluate MABbased sampling, we compare it with Monte-Carlo sampling for grasping an uncertain planar object defined by a Gaussian process implicit surface (GPIS), but the method is applicable to other models of uncertainty. We derive distributions on contact points, surface normal, and center of mass and use these solve the associated MAB model, finding that it computes grasps of similar quality and can reduce computation time by an order of magnitude. This suggests a number of new research questions about how MAB can be applied to other models of uncertainty and how different MAB solution techniques can be applied to further reduce computation.

[1]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[2]  S. Sastry,et al.  Task oriented optimal grasping by multifingered robot hands , 1987, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[3]  S. Shankar Sastry,et al.  Task-oriented optimal grasping by multifingered robot hands , 1987, IEEE J. Robotics Autom..

[4]  R. Simon,et al.  Optimal two-stage designs for phase II clinical trials. , 1989, Controlled clinical trials.

[5]  Kenneth Y. Goldberg,et al.  Bayesian grasping , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[6]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[7]  John F. Canny,et al.  Planning optimal grasps , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[8]  S. Morris COWLES FOUNDATION FOR RESEARCH IN ECONOMICS , 2001 .

[9]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[10]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[11]  Russell Greiner,et al.  The Budgeted Multi-armed Bandit Problem , 2004, COLT.

[12]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Yu Zheng,et al.  Coping with the Grasping Uncertainties in Force-closure Analysis , 2005, Int. J. Robotics Res..

[15]  Andrew Fitzgibbon,et al.  Gaussian Process Implicit Surfaces , 2006 .

[16]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[17]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[18]  Paul R. Schrater,et al.  Handling shape and contact location uncertainty in grasping two-dimensional planar objects , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[20]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[21]  Stefan Ulbrich,et al.  OpenGRASP: A Toolkit for Robot Grasping Simulation , 2010, SIMPAR.

[22]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[23]  Peter Brook,et al.  Bayesian Grasp Planning , 2011 .

[24]  Stefan Schaal,et al.  Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Olivier Teytaud,et al.  Online Sparse Bandit for Card Games , 2011, ACG.

[26]  Marc Toussaint,et al.  Gaussian process implicit surfaces for shape estimation and grasping , 2011, 2011 IEEE International Conference on Robotics and Automation.

[27]  Dmitry Berenson,et al.  Toward cloud-based grasping with uncertainty in shape: Estimating lower bounds on achieving force closure with zero-slip push grasps , 2012, 2012 IEEE International Conference on Robotics and Automation.

[28]  James J. Kuffner,et al.  Physically-based grasp quality evaluation under uncertainty , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29]  Dmitry Berenson,et al.  Estimating part tolerance bounds based on adaptive Cloud-based grasp planning with slip , 2012, 2012 IEEE International Conference on Automation Science and Engineering (CASE).

[30]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[31]  Ville Kyrki,et al.  Probabilistic sensor-based grasping , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[33]  Geoffrey A. Hollinger,et al.  Active planning for underwater inspection and the benefit of adaptivity , 2012, Int. J. Robotics Res..

[34]  A. Frank van der Stappen,et al.  Bounding the locus of the center of mass for a part with shape variation , 2014, Comput. Geom..

[35]  Matei T. Ciocarlie,et al.  GP-GPIS-OPT: Grasp planning with shape uncertainty using Gaussian process implicit surfaces and Sequential Convex Programming , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Colas Schretter,et al.  Monte Carlo and Quasi-Monte Carlo Methods , 2016 .

[37]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .