Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

To aid humans in everyday tasks, robots need to know which objects exist in the scene, where they are, and how to grasp and manipulate them in different situations. Therefore, object recognition and grasping are two key functionalities for autonomous robots. Most state-of-the-art approaches treat object recognition and grasping as two separate problems, even though both use visual input. Furthermore, the knowledge of the robot is fixed after the training phase. In such cases, if the robot encounters new object categories, it must be retrained to incorporate new information without catastrophic forgetting. In order to resolve this problem, we propose a deep learning architecture with an augmented memory capacity to handle open-ended object recognition and grasping simultaneously. In particular, our approach takes multi-views of an object as input and jointly estimates pixel-wise grasp configuration as well as a deep scale- and rotation-invariant representation as output. The obtained representation is then used for open-ended object recognition through a meta-active learning technique. We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings. A video of these experiments is available online at: https://youtu.be/n9SMpuEkOgk

[1]  Jen Jen Chung,et al.  Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter , 2021, CoRL.

[2]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[3]  S. Hamidreza Kasaei,et al.  Combining Shape Features with Multiple Color Spaces in Open-Ended 3D Object Recognition , 2020, ArXiv.

[4]  Cewu Lu,et al.  GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[6]  Gang Yu,et al.  Context Prior for Scene Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Céline Hudelot,et al.  Active Learning for Imbalanced Datasets , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Julien P. C. Valentin,et al.  ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Longyin Wen,et al.  Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ferat Sahin,et al.  Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Stella X. Yu,et al.  Transformer for 3D Point Clouds , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jürgen Leitner,et al.  Learning robust, real-time, reactive robotic grasping , 2019, Int. J. Robotics Res..

[13]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  S. H. Kasaei OrthographicNet: A Deep Transfer Learning Approach for 3-D Object Recognition in Open-Ended Domains , 2021, IEEE/ASME Transactions on Mechatronics.

[16]  Akshayvarun Subramanya,et al.  Fooling Network Interpretation in Image Classification , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Luís Seabra Lopes,et al.  Coping with Context Change in Open-Ended Object Recognition without Explicit Context Information , 2018, IROS.

[18]  Peter I. Corke,et al.  Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Gi Hyun Lim,et al.  Towards lifelong assistive robotics: A tight coupling between object perception and manipulation , 2018, Neurocomputing.

[20]  Tae-Kyun Kim,et al.  Perceiving, Learning, and Recognizing 3D Objects: An Approach to Cognitive Service Robots , 2018, AAAI.

[21]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[22]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[23]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[24]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[26]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[27]  P. Abbeel,et al.  Yale-CMU-Berkeley dataset for robotic manipulation research , 2017, Int. J. Robotics Res..

[28]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[29]  Luís Seabra Lopes,et al.  Object Learning and Grasping Capabilities for Robotic Home Assistants , 2016, RoboCup.

[30]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Kate Saenko,et al.  High precision grasp pose detection in dense clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[35]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Gi Hyun Lim,et al.  Interactive Open-Ended Learning for 3D Object Recognition: An Approach and Experiments , 2015, J. Intell. Robotic Syst..

[38]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yang Yu,et al.  Learning with Augmented Class by Exploiting Unlabeled Data , 2014, AAAI.

[40]  Terrance E. Boult,et al.  Probability Models for Open Set Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[42]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[43]  Luís Seabra Lopes,et al.  Using spoken words to guide open-ended category formation , 2011, Cognitive Processing.

[44]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[45]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[46]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[47]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[49]  Luís Seabra Lopes,et al.  Hierarchical Object Representation for Open-Ended Object Category Learning and Recognition , 2016, NIPS.

[50]  Gi Hyun Lim,et al.  3D object perception and perceptual learning in the RACE project , 2016, Robotics Auton. Syst..

[51]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.