Learning to Manipulate Unknown Objects in Clutter by Reinforcement

We present a fully autonomous robotic system for grasping objects in dense clutter. The objects are unknown and have arbitrary shapes. Therefore, we cannot rely on prior models. Instead, the robot learns online, from scratch, to manipulate the objects by trial and error. Grasping objects in clutter is significantly harder than grasping isolated objects, because the robot needs to push and move objects around in order to create sufficient space for the fingers. These pre-grasping actions do not have an immediate utility, and may result in unnecessary delays. The utility of a pre-grasping action can be measured only by looking at the complete chain of consecutive actions and effects. This is a sequential decision-making problem that can be cast in the reinforcement learning framework. We solve this problem by learning the stochastic transitions between the observed states, using nonparametric density estimation. The learned transition function is used only for re-calculating the values of the executed actions in the observed states, with different policies. Values of new state-actions are obtained by regressing the values of the executed actions. The state of the system at a given time is a depth (3D) image of the scene. We use spectral clustering for detecting the different objects in the image. The performance of our system is assessed on a robot with real-world objects.

[1]  Csaba Szepesvári,et al.  Model Selection in Reinforcement Learning , 2011, Machine Learning.

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[4]  Abdeslam Boularias,et al.  Efficient Optimization for Autonomous Robotic Manipulation of Natural Objects , 2014, AAAI.

[5]  Florentin Wörgötter,et al.  Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Manuela M. Veloso,et al.  Push-manipulation of complex passive mobile objects using experimentally acquired motion models , 2015, Auton. Robots.

[7]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[8]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Oliver Kroemer,et al.  Learning robot grasping from 3-D images with Markov Random Fields , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[11]  Oliver Kroemer,et al.  Learning grasp affordance densities , 2011, Paladyn J. Behav. Robotics.

[12]  Siddhartha S. Srinivasa,et al.  Physics-Based Grasp Planning Through Clutter , 2012, Robotics: Science and Systems.

[13]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[16]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[17]  J. Andrew Bagnell,et al.  Robust Object Grasping using Force Compliant Motion Primitives , 2012, Robotics: Science and Systems.

[18]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[19]  David Wingate,et al.  A Physics-Based Model Prior for Object-Oriented MDPs , 2014, ICML.

[20]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[21]  Ashutosh Saxena,et al.  Special issue on autonomous grasping and manipulation , 2014, Auton. Robots.

[22]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..