论文信息 - RGB-D object detection and semantic segmentation for autonomous manipulation in clutter

RGB-D object detection and semantic segmentation for autonomous manipulation in clutter

Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D cameras, for which we developed a depth fusion method. Employing pretrained features makes learning from small annotated robotic datasets possible. We evaluate our approach on two challenging datasets: one captured for the Amazon Picking Challenge 2016, where our team NimbRo came in second in the Stowing and third in the Picking task; and one captured in disaster-response scenarios. The experiments show that object detection and semantic segmentation complement each other and can be combined to yield reliable object perception.

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Oliver Brock,et al. Analysis and Observations From the First Amazon Picking Challenge , 2016, IEEE Transactions on Automation Science and Engineering.

[3] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[7] Jun Li,et al. Mobile bin picking with an anthropomorphic service robot , 2013, 2013 IEEE International Conference on Robotics and Automation.

[8] I. Guyon,et al. Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[9] Jun Li,et al. Combining contour and shape primitives for object detection and pose estimation of prefabricated parts , 2013, 2013 IEEE International Conference on Image Processing.

[10] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[13] Kuan-Ting Yu,et al. Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14] Emanuele Menegatti,et al. Flexible 3D localization of planar objects for industrial bin-picking with monocamera vision system , 2013, 2013 IEEE International Conference on Automation Science and Engineering (CASE).

[15] Grigorios Tsoumakas,et al. On the Stratification of Multi-label Data , 2011, ECML/PKDD.

[16] Carlos Martínez,et al. Automated bin picking system for randomly located industrial parts , 2015, 2015 IEEE International Conference on Technologies for Practical Robot Applications (TePRA).

[17] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[19] Sven Behnke,et al. Hierarchical Neural Networks for Image Interpretation (Lecture Notes in Computer Science) , 2003 .

[20] Sven Behnke,et al. NimbRo picking: Versatile part handling for warehouse automation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21] George Loizou,et al. Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[22] Oliver Brock,et al. Probabilistic multi-class segmentation for the Amazon Picking Challenge , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23] Sven Behnke,et al. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[24] Jeremy A. Marvel,et al. Addressing perception uncertainty induced failure modes in robotic bin-picking , 2016 .

[25] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[26] Jörg Stückler,et al. Real-time object detection, localization and verification for fast robotic depalletizing , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27] Kuan-Ting Yu,et al. A Summary of Team MIT's Approach to the Amazon Picking Challenge 2015 , 2016, ArXiv.

[28] Sven Behnke,et al. NimbRo Rescue: Solving Disaster‐response Tasks with the Mobile Manipulation Robot Momaro , 2017, J. Field Robotics.

[29] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[30] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31] Kensuke Harada,et al. Iterative Visual Recognition for Learning Based Randomized Bin-Picking , 2016, ISER.

[32] Alekseĭ Grigorʹevich Ivakhnenko,et al. CYBERNETIC PREDICTING DEVICES , 1966 .

[33] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35] Sven Behnke,et al. Hierarchical Neural Networks for Image Interpretation , 2003, Lecture Notes in Computer Science.

[36] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[37] Sven Behnke,et al. NimbRo Rescue: Solving Disaster-Response Tasks through Mobile Manipulation Robot Momaro , 2018, ArXiv.

[38] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[40] Sven Behnke,et al. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[41] Kazuhiko Sumi,et al. Fast graspability evaluation on single depth maps for bin picking with general grippers , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[42] Alexander Scholz,et al. Combining visual and inertial features for efficient grasping and bin-picking , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[43] Horst Bischof,et al. Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[44] Martijn Wisse,et al. Team Delft's Robot Winner of the Amazon Picking Challenge 2016 , 2016, RoboCup.

[45] Rui Zhang,et al. Semantic Image Segmentation with Deep Convolutional Neural Networks and Quick Shift , 2020, Symmetry.

[46] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Andreas Geiger,et al. Efficient Large-Scale Stereo Matching , 2010, ACCV.

[48] Sven Behnke,et al. Combining Semantic and Geometric Features for Object Class Segmentation of Indoor Scenes , 2017, IEEE Robotics and Automation Letters.

[49] Oliver Brock,et al. Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems , 2016, Robotics: Science and Systems.

[50] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[51] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[52] Jitendra Malik,et al. Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Nassir Navab,et al. Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56] Oliver Brock,et al. Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems , 2016, IJCAI.

[57] Peter I. Corke,et al. The ACRV picking benchmark: A robotic shelf picking benchmark to foster reproducible research , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[58] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.