Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and thus properly grasping and manipulating the tool to achieve the task. Task-agnostic grasping optimizes for grasp robustness while ignoring crucial task-specific constraints. In this paper, we propose the Task-Oriented Grasping Network (TOG-Net) to jointly optimize both task-oriented grasping of a tool and the manipulation policy for that tool. The training process of the model is based on large-scale simulated self-supervision with procedurally generated tool objects. We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering. Our model achieves overall 71.1% task success rate for sweeping and 80.0% task success rate for hammering. Supplementary material is available at: this http URL

[1]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[2]  S. Shankar Sastry,et al.  Task-oriented optimal grasping by multifingered robot hands , 1987, IEEE J. Robotics Autom..

[3]  John F. Canny,et al.  Planning optimal grasps , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[4]  Kevin M. Lynch,et al.  Stable Pushing: Mechanics, Controllability, and Planning , 1995, Int. J. Robotics Res..

[5]  Matthew M. Williamson,et al.  Robot arm control exploiting natural dynamics , 1999 .

[6]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001, Künstliche Intell..

[7]  Chris Baber Cognition and Tool Use: Forms of Engagement in Human and Animal Use of Tools , 2003 .

[8]  Henrik I. Christensen,et al.  Automatic grasp planning using shape primitives , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[9]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[10]  Alexander Stoytchev,et al.  Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[11]  Helge J. Ritter,et al.  Task-oriented quality measures for dextrous grasping , 2005, 2005 International Symposium on Computational Intelligence in Robotics and Automation.

[12]  Angel P. del Pobil,et al.  Task-Oriented Grasping using Hand Preshapes and Task Frames , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[13]  C. Sammut,et al.  An Architecture for Tool Use and Learning in Robots , 2007 .

[14]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[15]  Faouzi Ghorbel,et al.  A simple and efficient approach for 3D mesh approximate convex decomposition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[16]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17]  Matei T. Ciocarlie,et al.  Hand Posture Subspaces for Dexterous Robotic Grasping , 2009, Int. J. Robotics Res..

[18]  Danica Kragic,et al.  Learning task constraints for robot grasping using graphical models , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  F. Osiurak,et al.  Grasping the affordances, understanding the reasoning: toward a dialectical theory of human tool use. , 2010, Psychological review.

[20]  Tetsunari Inamura,et al.  Learning of Tool Affordances for autonomous tool manipulation , 2011, 2011 IEEE/SICE International Symposium on System Integration (SII).

[21]  Siddhartha S. Srinivasa,et al.  A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[22]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[23]  Wied Ruijssenaars,et al.  Encyclopedia of the Sciences of Learning , 2012 .

[24]  Peter K. Allen,et al.  Semantic grasping: Planning robotic grasps functionally suitable for an object manipulation task , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[26]  Alberto Rodriguez,et al.  From caging to grasping , 2012, Int. J. Robotics Res..

[27]  Markus Vincze,et al.  AfRob: The affordance network ontology for robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Alexander Herzog,et al.  Template-based learning of grasp selection , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29]  Nicholas Roy,et al.  A Framework for Push-Grasping in Clutter , 2012 .

[30]  Dirk Kraft,et al.  A Survey of the Ontogeny of Tool Use: From Sensorimotor Experience to Planning , 2013, IEEE Transactions on Autonomous Mental Development.

[31]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2013, Int. J. Robotics Res..

[32]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[33]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2014, Auton. Robots.

[34]  Li Fei-Fei,et al.  Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.

[35]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[36]  Manuela M. Veloso,et al.  Push-manipulation of complex passive mobile objects using experimentally acquired motion models , 2015, Auton. Robots.

[37]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Giorgio Metta,et al.  Self-supervised learning of grasp dependent tool affordances on the iCub Humanoid robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Henrik I. Christensen,et al.  Grasping for a Purpose: Using Task Goals for Efficient Manipulation Planning , 2016, ArXiv.

[43]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Larry H. Matthies,et al.  Task-oriented grasping with semantic and geometric scene understanding , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[47]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using easily simulated depth images , 2017, ArXiv.

[48]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[49]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[50]  Sergey Levine,et al.  End-to-End Learning of Semantic Grasping , 2017, CoRL.

[51]  Wolfram Burgard,et al.  Learning to Singulate Objects using a Push Proposal Network , 2017, ISRR.

[52]  Giorgio Metta,et al.  Self-supervised learning of tool affordances from 3D tool representation through parallel SOM mapping , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Anca D. Dragan,et al.  Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Peter I. Corke,et al.  Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[56]  Arkanath Pathak,et al.  Learning 6-DOF Grasping Interaction via Deep Geometry-Aware 3D Representations , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[57]  Darwin G. Caldwell,et al.  AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[58]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[59]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[60]  Mrinal Kalakrishnan,et al.  Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[62]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2020, Int. J. Robotics Res..