Learning task-oriented grasping for tool manipulation from simulated self-supervision

Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and, thus, properly grasping and manipulating the tool to achieve the task. Most work in robotics has focused on task-agnostic grasping, which optimizes for only grasp robustness without considering the subsequent manipulation tasks. In this article, we propose the Task-Oriented Grasping Network (TOG-Net) to jointly optimize both task-oriented grasping of a tool and the manipulation policy for that tool. The training process of the model is based on large-scale simulated self-supervision with procedurally generated tool objects. We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering. Our model achieves overall 71.1% task success rate for sweeping and 80.0% task success rate for hammering.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Henrik I. Christensen,et al.  Automatic grasp planning using shape primitives , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[3]  Giorgio Metta,et al.  Self-supervised learning of tool affordances from 3D tool representation through parallel SOM mapping , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Alberto Rodriguez,et al.  From caging to grasping , 2011, Int. J. Robotics Res..

[5]  Nicholas Roy,et al.  A Framework for Push-Grasping in Clutter , 2012 .

[6]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[7]  Darwin G. Caldwell,et al.  AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Matei T. Ciocarlie,et al.  Hand Posture Subspaces for Dexterous Robotic Grasping , 2009, Int. J. Robotics Res..

[9]  Tetsunari Inamura,et al.  Learning of Tool Affordances for autonomous tool manipulation , 2011, 2011 IEEE/SICE International Symposium on System Integration (SII).

[10]  Arkanath Pathak,et al.  Learning 6-DOF Grasping Interaction via Deep Geometry-Aware 3D Representations , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Peter K. Allen,et al.  Semantic grasping: Planning robotic grasps functionally suitable for an object manipulation task , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[13]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Matthew M. Williamson,et al.  Robot arm control exploiting natural dynamics , 1999 .

[15]  Danica Kragic,et al.  Affordance detection for task-specific grasping using deep learning , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[16]  Faouzi Ghorbel,et al.  A simple and efficient approach for 3D mesh approximate convex decomposition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[17]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Alexander Herzog,et al.  Template-based learning of grasp selection , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Peter I. Corke,et al.  Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[21]  Wolfram Burgard,et al.  Learning to Singulate Objects using a Push Proposal Network , 2017, ISRR.

[22]  Anca D. Dragan,et al.  Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  C. Sammut,et al.  An Architecture for Tool Use and Learning in Robots , 2007 .

[24]  Mrinal Kalakrishnan,et al.  Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[26]  S. Shankar Sastry,et al.  Task-oriented optimal grasping by multifingered robot hands , 1987, IEEE J. Robotics Autom..

[27]  Sergey Levine,et al.  End-to-End Learning of Semantic Grasping , 2017, CoRL.

[28]  Oliver Kroemer,et al.  A kernel-based approach to direct action perception , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Larry H. Matthies,et al.  Task-oriented grasping with semantic and geometric scene understanding , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Markus Vincze,et al.  AfRob: The affordance network ontology for robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Li Fei-Fei,et al.  Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.

[33]  Giorgio Metta,et al.  Self-supervised learning of grasp dependent tool affordances on the iCub Humanoid robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Danica Kragic,et al.  Learning task constraints for robot grasping using graphical models , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Manuela M. Veloso,et al.  Push-manipulation of complex passive mobile objects using experimentally acquired motion models , 2015, Auton. Robots.

[36]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[37]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[38]  Chris Baber Cognition and Tool Use: Forms of Engagement in Human and Animal Use of Tools , 2003 .

[39]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Wied Ruijssenaars,et al.  Encyclopedia of the Sciences of Learning , 2012 .

[42]  Angel P. del Pobil,et al.  Task-Oriented Grasping using Hand Preshapes and Task Frames , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[43]  Alexander Stoytchev,et al.  Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[44]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[45]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using easily simulated depth images , 2017, ArXiv.

[46]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[47]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[48]  Siddhartha S. Srinivasa,et al.  A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[49]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[50]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[51]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Kevin M. Lynch,et al.  Stable Pushing: Mechanics, Controllability, and Planning , 1995, Int. J. Robotics Res..

[53]  F. Osiurak,et al.  Grasping the affordances, understanding the reasoning: toward a dialectical theory of human tool use. , 2010, Psychological review.

[54]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[55]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[56]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[57]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[58]  Helge J. Ritter,et al.  Task-oriented quality measures for dextrous grasping , 2005, 2005 International Symposium on Computational Intelligence in Robotics and Automation.

[59]  John F. Canny,et al.  Planning optimal grasps , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[60]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..