To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels

3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network. Its layers can consume any arbitrary convolution kernel in place of the default inner product kernel and exploit the underlying local geometry around each pixel. We outline four such kernels: a dense kernel according to the bag-of-words paradigm, and three graph kernels inspired by recent graph neural network advances: the Transformer, the PointNet, and the Edge Convolution. We also explore cross-modality fusion with the camera image, facilitated by operating in the perspective range image view. Our method performs competitively on the Waymo Open Dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%. It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters.

[1]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[3]  Yin Zhou,et al.  End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds , 2019, CoRL.

[4]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[6]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[7]  Laurens van der Maaten,et al.  Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[8]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Carlos Vallespi-Gonzalez,et al.  LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Quoc V. Le,et al.  Improving 3D Object Detection through Progressive Population Based Augmentation , 2020, ECCV.

[12]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[14]  Dragomir Anguelov,et al.  Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection , 2020, CoRL.

[15]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[16]  Duc Thanh Nguyen,et al.  Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Yue Wang,et al.  Pillar-based Object Detection for Autonomous Driving , 2020, ECCV.

[18]  Thomas Funkhouser,et al.  An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds , 2020, ECCV.

[19]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[20]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[22]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25]  WangYue,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2019 .

[26]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[28]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yin Zhou,et al.  StarNet: Targeted Computation for Object Detection in Point Clouds , 2019, ArXiv.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.