A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition

This paper addresses the problem of simultaneous 3D reconstruction and material recognition and segmentation. Enabling robots to recognise different materials (concrete, metal etc.) in a scene is important for many tasks, e.g. robotic interventions in nuclear decommissioning. Previous work on 3D semantic reconstruction has predominantly focused on recognition of everyday domestic objects (tables, chairs etc.), whereas previous work on material recognition has largely been confined to single 2D images without any 3D reconstruction. Meanwhile, most 3D semantic reconstruction methods rely on computationally expensive post-processing, using Fully-Connected Conditional Random Fields (CRFs), to achieve consistent segmentations. In contrast, we propose a deep learning method which performs 3D reconstruction while simultaneously recognising different types of materials and labeling them at the pixel level. Unlike previous methods, we propose a fully end-to-end approach, which does not require hand-crafted features or CRF post-processing. Instead, we use only learned features, and the CRF segmentation constraints are incorporated inside the fully end-to-end learned system. We present the results of experiments, in which we trained our system to perform real-time 3D semantic reconstruction for 23 different materials in a real-world application. The run-time performance of the system can be boosted to around 10Hz, using a conventional GPU, which is enough to achieve realtime semantic reconstruction using a 30fps RGB-D camera. To the best of our knowledge, this work is the first real-time end-to-end system for simultaneous 3D reconstruction and material recognition.

[1]  Dongbing Gu,et al.  Building a grid-point cloud-semantic map based on graph for the navigation of intelligent wheelchair , 2015, 2015 21st International Conference on Automation and Computing (ICAC).

[2]  Nassir Navab,et al.  When 2.5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Paul H. J. Kelly,et al.  Dense planar SLAM , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[4]  Ko Nishino,et al.  Material Recognition from Local Appearance in Global Context , 2016, ArXiv.

[5]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Patrick Pérez,et al.  Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Ko Nishino,et al.  Visual Material Traits: Recognizing Per-Pixel Material Context , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[8]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[9]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03 - 07, 2014 , 2014, ACM Multimedia.

[12]  Rong Xiao,et al.  Pairwise Rotation Invariant Co-Occurrence Local Binary Pattern , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Ales Leonardis,et al.  Evaluating Deep Convolutional Neural Networks for Material Classification , 2017, VISIGRAPP.

[14]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[15]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Alexei A. Efros,et al.  A 4D Light-Field Dataset and CNN Architectures for Material Recognition , 2016, ECCV.

[17]  J. Paul Siebert,et al.  Recognising the clothing categories from free-configuration using Gaussian-Process-based interactive perception , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Edward H. Adelson,et al.  Exploring features in a Bayesian framework for material recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Li Sun,et al.  Accurate garment surface analysis using an active stereo robot head with application to dual-arm flattening , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Li Sun,et al.  Weakly-supervised DCNN for RGB-D Object Recognition in Real-World Applications Which Lack Large-scale Annotated Training Data , 2017, ArXiv.

[23]  Derek Hoiem,et al.  Geometry-Informed Material Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[25]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Xiaofeng Ren,et al.  Toward Robust Material Recognition for Everyday Objects , 2011, BMVC.

[28]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).