Semantic Segmentation from Limited Training Data

We present our approach for robotic perception in cluttered scenes that led to winning the recent Amazon Robotics Challenge (ARC) 2017. Next to small objects with shiny and transparent surfaces, the biggest challenge of the 2017 competition was the introduction of unseen categories. In contrast to traditional approaches which require large collections of annotated data and many hours of training, the task here was to obtain a robust perception pipeline with only few minutes of data acquisition and training time. To that end, we present two strategies that we explored. One is a deep metric learning approach that works in three separate steps: semantic-agnostic boundary detection, patch classification and pixel-wise voting. The other is a fully-supervised semantic segmentation approach with efficient dataset collection. We conduct an extensive analysis of the two methods on our ARC 2017 dataset. Interestingly, only few examples of each class are sufficient to fine-tune even very deep convolutional neural networks for this specific task.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Zhao Yi,et al.  A Robust Robot Design for Item Picking , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Jürgen Leitner,et al.  Mechanical Design of a Cartesian Manipulator for Warehouse Pick and Place , 2017 .

[4]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Martijn Wisse,et al.  Team Delft's Robot Winner of the Amazon Picking Challenge 2016 , 2016, RoboCup.

[10]  Yasuyuki Matsushita,et al.  GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sven Behnke,et al.  RGB-D object detection and semantic segmentation for autonomous manipulation in clutter , 2018, Int. J. Robotics Res..

[12]  Sven Behnke,et al.  NimbRo picking: Versatile part handling for warehouse automation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sven Behnke,et al.  Fast Object Learning and Dual-arm Coordination for Cluttered Stowing, Picking, and Packing , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[18]  Kuan-Ting Yu,et al.  A Summary of Team MIT's Approach to the Amazon Picking Challenge 2015 , 2016, ArXiv.

[19]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[21]  Pablo Andrés Arbeláez,et al.  Boundary Extraction in Natural Images Using Ultrametric Contour Maps , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[22]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[23]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Oliver Brock,et al.  Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems , 2016, IJCAI.

[26]  Oliver Brock,et al.  Analysis and Observations From the First Amazon Picking Challenge , 2016, IEEE Transactions on Automation Science and Engineering.

[27]  Jordi Pont-Tuset,et al.  Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Peter I. Corke,et al.  Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Sven Behnke,et al.  Combining Semantic and Geometric Features for Object Class Segmentation of Indoor Scenes , 2017, IEEE Robotics and Automation Letters.

[30]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[38]  Oliver Brock,et al.  Probabilistic multi-class segmentation for the Amazon Picking Challenge , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).