Deep Hough Voting for 3D Object Detection in Point Clouds

Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -- samples from 2D manifolds in 3D space -- we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.

[1]  P.V.C. Hough,et al.  Machine Analysis of Bubble Chamber Pictures , 1959 .

[2]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[3]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[4]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[5]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Object detection using a max-margin Hough transform , 2009, CVPR.

[7]  Luc Van Gool,et al.  Orientation invariant 3D object classification using hough transform based methods , 2010, 3DOR '10.

[8]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[9]  Luc Van Gool,et al.  Scene Cut: Class-Specific Object Detection and Segmentation in 3D Scenes , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[10]  K. Lingemann,et al.  The 3D Hough Transform for plane detection in point clouds: A review and a new accumulator design , 2011 .

[11]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[14]  Konrad Schindler,et al.  IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS , 2012, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[15]  Silvio Savarese,et al.  Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Björn Stenger,et al.  Demisting the Hough Transform for 3D Shape Recognition and Registration , 2014, International Journal of Computer Vision.

[18]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Leonidas J. Guibas,et al.  Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction , 2015, Comput. Graph. Forum.

[21]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  C. Damm,et al.  Object detection in 3 D point clouds , 2016 .

[25]  Erik B. Sudderth,et al.  Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wang Li,et al.  Vehicle Logo Retrieval Based on Hough Transform and Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[28]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[30]  Seyed-Ahmad Ahmadi,et al.  Hough-CNN: Deep learning for segmentation of deep brain regions in MRI and ultrasound , 2016, Comput. Vis. Image Underst..

[31]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[32]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..

[33]  Alexander M. Bronstein,et al.  ASIST: Automatic semantically invariant scene transformation , 2015, Comput. Vis. Image Underst..

[34]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Bernard Ghanem,et al.  2D-Driven 3D Object Detection in RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Yaron Lipman,et al.  Point convolutional neural networks by extension operators , 2018, ACM Trans. Graph..

[42]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Sainan Liu,et al.  Attentional ShapeContextNet for Point Cloud Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Maks Ovsjanikov,et al.  PCPNet Learning Local Shape Properties from Raw Point Clouds , 2017, Comput. Graph. Forum.

[46]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Andrea Vedaldi,et al.  Semi-convolutional Operators for Instance Segmentation , 2018, ECCV.

[48]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[49]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Yifan Xu,et al.  SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[52]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Vladlen Koltun,et al.  Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Ye Duan,et al.  PointGrid: A Deep Network for 3D Shape Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Leonidas J. Guibas,et al.  GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[58]  Matthias Nießner,et al.  3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).