PointFPN: A Frustum-based Feature Pyramid Network for 3D Object Detection

3D object detection is crucial to ensure the reliability and stability of autonomous driving systems. In recent years, researchers have made great progress in 3D object detection by combining features of images and point clouds. However, there is still much room for detection accuracy improvement, especially for small object detection. In this paper, we propose a frustum-based 3D object detection model named PointFPN. The core idea of PointFPN is learning expressive semantic and contextual information for small objects, e.g., pedestrians and cyclists in an urban street scene. To detect 3D objects, our model uses frustums to bridge the gap between images and point clouds and thus generating proposals. Then, a feature pyramid structure is designed to extract and fuse multi-level features of target objects represented by point clouds. Meanwhile, we develop a multilevel regression network to calculate different parameters of 3D bounding boxes at different feature levels. Through elaborate structure designed above, our model can learn discriminative features which are highly relevant to bounding box parameters at different feature levels. Experimental study shows that our model is effective in detecting small objects and has a strong robustness to sparse point clouds. Our model demonstrates state-of-the-art performance on small object detection on KITTI benchmark.

[1]  Jun Won Choi,et al.  3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection , 2020, ECCV.

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Xin Zhao,et al.  3D Object Detection Using Scale Invariant and Feature Reweighting Networks , 2019, AAAI.

[5]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Larry S. Davis,et al.  Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jun Wang,et al.  MLCVNet: Multi-Level Context VoteNet for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xin Zhao,et al.  TANet: Robust 3D Object Detection from Point Clouds with Triple Attention , 2019, AAAI.

[15]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[16]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zhixin Wang,et al.  Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[19]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Masayoshi Tomizuka,et al.  Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism , 2019, ArXiv.

[21]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Weijing Shi,et al.  Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shu Liu,et al.  IPOD: Intensive Point-based Object Detector for Point Cloud , 2018, ArXiv.

[24]  Gim Hee Lee,et al.  Transferable Semi-Supervised 3D Object Detection From RGB-D Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Shaojie Shen,et al.  Multi-Sensor 3D Object Box Refinement for Autonomous Driving , 2019, ArXiv.

[26]  Yanan Sun,et al.  3DSSD: Point-Based 3D Single Stage Object Detector , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiaogang Wang,et al.  Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud , 2019, ArXiv.

[28]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[29]  Song Han,et al.  Point-Voxel CNN for Efficient 3D Deep Learning , 2019, NeurIPS.

[30]  Xiaoyong Shen,et al.  STD: Sparse-to-Dense 3D Object Detector for Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.