Ab Initio Particle-based Object Manipulation

Particle-based Object Manipulation (PROMPT) is a new method for robot manipulation of novel objects, without prior object models or pre-training on a large object data set. The key element of PROMPT is a particle-based object representation, in which each particle represents a point in an object, the local geometric, physical, and other features of the point, and also its relation with other particles. The particle representation connects visual perception with robot control. Like data-driven methods, PROMPT infers the object representation online in real time from the visual sensor. Like model-based methods, PROMPT leverages the particle representation to reason about the object’s geometry and dynamics, and choose suitable manipulation actions accordingly. PROMPT thus combines the strengths of model-based and data-driven methods. We show empirically that PROMPT successfully handles a variety of everyday objects, some of which are transparent. It handles various manipulation tasks, including grasping, pushing, etc. . Our experiments also show that PROMPT outperforms a stateof-the-art data-driven grasping method on everyday household objects, even though it does not use any offline training data. The code and a demonstration video are available online

[1]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[2]  Manuel Lopes,et al.  Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions , 2012, Robotics Auton. Syst..

[3]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[4]  Dan Ding,et al.  Computing 3-D optimal form-closure grasps , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[5]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[8]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[9]  Stefanie Tellex,et al.  Autonomously Acquiring Instance-Based Object Models from Experience , 2015, ISRR.

[10]  Matei T. Ciocarlie,et al.  Collaborative grasp planning with multiple object representations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[11]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[13]  Danica Kragic,et al.  Mind the gap - robotic grasping under incomplete observation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Ling Xu,et al.  Physical Human Interactive Guidance: Identifying Grasping Principles From Human-Planned Grasps , 2012, IEEE Transactions on Robotics.

[15]  J J Koenderink,et al.  Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[16]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Danica Kragic,et al.  Grasping known objects with humanoid robots: A box-based approach , 2009, 2009 International Conference on Advanced Robotics.

[18]  Nico Blodow,et al.  General 3D modelling of novel objects from a single view , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Arthur G. Richards,et al.  Robust constrained model predictive control , 2005 .

[20]  Marc Pollefeys,et al.  Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  David Hsu,et al.  Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties , 2018, Robotics: Science and Systems.

[22]  Aldo Laurentini,et al.  How Far 3D Shapes Can Be Understood from 2D Silhouettes , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Frédo Durand,et al.  DiffTaichi: Differentiable Programming for Physical Simulation , 2020, ICLR.

[24]  Daniel Leidner,et al.  Power grasp planning for anthropomorphic robot hands , 2012, 2012 IEEE International Conference on Robotics and Automation.

[25]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[26]  Frédo Durand,et al.  Taichi , 2019, ACM Trans. Graph..

[27]  Edmond Boyer,et al.  Efficient Polyhedral Modeling from Silhouettes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Zichen Zhang,et al.  U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection , 2020, Pattern Recognit..

[29]  Lucian Cosmin Goron,et al.  Reconstruction and Verification of 3D Object Models for Grasping , 2009, ISRR.

[30]  Peter K. Allen,et al.  Grasp Planning via Decomposition Trees , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[31]  Danica Kragic,et al.  Classical grasp quality evaluation: New algorithms and theory , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[33]  Ben Semerjian,et al.  A New Variational Framework for Multiview Surface Reconstruction , 2014, ECCV.

[34]  Alexander Herzog,et al.  Learning of grasp selection based on shape-templates , 2014, Auton. Robots.

[35]  Richard Pito,et al.  A Solution to the Next Best View Problem for Automated Surface Acquisition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Lei Zhang,et al.  Detail-Preserving and Content-Aware Variational Multi-View Stereo Reconstruction , 2015, IEEE Transactions on Image Processing.

[37]  Gary M. Bone,et al.  Automated modeling and robotic grasping of unknown three-dimensional objects , 2008, 2008 IEEE International Conference on Robotics and Automation.

[38]  Richard Szeliski,et al.  Rapid octree construction from image sequences , 1993 .

[39]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[40]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Siddhartha S. Srinivasa,et al.  Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set , 2015, IEEE Robotics & Automation Magazine.