SEGCloud: Semantic Segmentation of 3D Point Clouds

3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works leverage the capabilities of Neural Networks(NNs), but are limited to coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation. Then the FC-CRF enforces global consistency and provides fine-grained semantics on the points. We implement the latter as a differentiable Recurrent NN to allow joint optimization. We evaluate the framework on two indoor and two outdoor 3D datasets (NYU V2, S3DIS, KITTI, Semantic3D.net), and show performance comparable or superior to the state-of-the-art on all datasets.

[1]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[2]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[3]  Jana Kosecka,et al.  Multiview RGB-D Dataset for Object Instance Detection , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[4]  Marc Pollefeys,et al.  Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark , 2017, ArXiv.

[5]  Avideh Zakhor,et al.  Sensor fusion for semantic segmentation of urban scenes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Pushmeet Kohli,et al.  Associative Hierarchical Random Fields , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  O. Barinova,et al.  NON-ASSOCIATIVE MARKOV NETWORKS FOR 3D POINT CLOUD CLASSIFICATION , 2010 .

[10]  Andrew Adams,et al.  Fast High‐Dimensional Filtering Using the Permutohedral Lattice , 2010, Comput. Graph. Forum.

[11]  Thorsten Joachims,et al.  Contextually Guided Semantic Labeling and Search for 3D Point Clouds , 2011, ArXiv.

[12]  Martial Hebert,et al.  Efficient 3-D scene analysis from streaming data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[15]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xuan Song,et al.  Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[19]  C. Mallet,et al.  AIRBORNE LIDAR FEATURE SELECTION FOR URBAN CLASSIFICATION USING RANDOM FORESTS , 2009 .

[20]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[21]  José García Rodríguez,et al.  PointNet: A 3D Convolutional Neural Network for real-time object class recognition , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[22]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[23]  Marc Pollefeys,et al.  Efficient Structured Parsing of Facades Using Dynamic Programming , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Michael Felsberg,et al.  Deep Projective 3D Semantic Segmentation , 2017, CAIP.

[25]  Ahmad Kamal Aijazi,et al.  Segmentation Based Classification of 3D Urban Point Clouds: A Super-Voxel Based Approach with Evaluation , 2013, Remote. Sens..

[26]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  E. Meijering A chronology of interpolation: from ancient astronomy to modern signal and image processing , 2002, Proc. IEEE.

[28]  Martial Hebert,et al.  Natural terrain classification using three‐dimensional ladar data for ground robot mobility , 2006, J. Field Robotics.

[29]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Xiaowu Chen,et al.  3D Mesh Labeling via Deep Convolutional Neural Networks , 2015, ACM Trans. Graph..

[31]  E. Meijering,et al.  A chronology of interpolation: from ancient astronomy to modern signal and image processing , 2002, Proc. IEEE.

[32]  Martial Hebert,et al.  Directional Associative Markov Network for 3-D Point Cloud Classification , 2008 .

[33]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Konrad Schindler,et al.  FAST SEMANTIC SEGMENTATION OF 3D POINT CLOUDS WITH STRONGLY VARYING DENSITY , 2016 .

[35]  Jing Huang,et al.  Point cloud labeling using 3D Convolutional Neural Network , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[36]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Gang Wang,et al.  Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling , 2014, ECCV.

[38]  Luc Van Gool,et al.  Learning Where to Classify in Multi-view Semantic Segmentation , 2014, ECCV.

[39]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[43]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[44]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[46]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[48]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[49]  Xiangjing An,et al.  An Efficient Scene Semantic Labeling Approach for 3D Point Cloud , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[50]  Boris Jutzi,et al.  Semantic 3D scene interpretation: A framework combining optimal neighborhood size selection with relevant features , 2014 .

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Martial Hebert,et al.  Onboard contextual classification of 3-D point clouds with learned high-order Markov Random Fields , 2009, 2009 IEEE International Conference on Robotics and Automation.

[54]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[56]  Alexandre Boulch,et al.  Unstructured Point Cloud Semantic Labeling Using Deep Segmentation Networks , 2017, 3DOR@Eurographics.

[57]  Markus Vincze,et al.  Enhancing Semantic Segmentation for Robotics: The Power of 3-D Entangled Forests , 2016, IEEE Robotics and Automation Letters.

[58]  Markus Vincze,et al.  Fast semantic segmentation of 3D point clouds using a dense CRF with learned parameters , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[59]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Konstantinos Kamnitsas,et al.  Efficient multi‐scale 3D CNN with fully connected CRF for accurate brain lesion segmentation , 2016, Medical Image Anal..

[61]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[62]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[63]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[64]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[66]  Duc Thanh Nguyen,et al.  SceneNN: A Scene Meshes Dataset with aNNotations , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[67]  Thomas A. Funkhouser,et al.  Learning Hierarchical Semantic Segmentations of LIDAR Data , 2015, 2015 International Conference on 3D Vision.

[68]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[69]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[70]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[71]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[72]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[73]  Luc Van Gool,et al.  3D all the way: Semantic segmentation of urban scenes from start to end in 3D , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Yan Lu,et al.  Simplified markov random fields for efficient semantic labeling of 3D point clouds , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.