论文信息 - From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network

From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network

3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part-<inline-formula><tex-math notation="LaTeX">$A^2$</tex-math><alternatives><mml:math><mml:msup><mml:mi>A</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="shi-ieq1-2977026.gif"/></alternatives></inline-formula> net). The whole framework consists of the part-aware stage and the part-aggregation stage. First, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part-<inline-formula><tex-math notation="LaTeX">$A^2$</tex-math><alternatives><mml:math><mml:msup><mml:mi>A</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="shi-ieq2-2977026.gif"/></alternatives></inline-formula> net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data.

Xiaogang Wang | Zhe Wang | Jianping Shi | Hongsheng Li | Shaoshuai Shi

[1] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[3] Bernard Ghanem,et al. 3D Instance Segmentation via Multi-Task Metric Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Chi-Wing Fu,et al. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Steven L. Waslander,et al. Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Xiaogang Wang,et al. GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Zhixin Wang,et al. Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8] Shu Liu,et al. Associatively Segmenting Instances and Semantics in Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Shaojie Shen,et al. Stereo R-CNN Based 3D Object Detection for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yan Wang,et al. Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Matthias Nießner,et al. 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Jiong Yang,et al. PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Xiaogang Wang,et al. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Leonidas J. Guibas,et al. GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Adrien Gaidon,et al. ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Fuxin Li,et al. PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Hei Law,et al. CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[18] Shuai Yi,et al. FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction , 2019, NeurIPS.

[19] Wei Wu,et al. PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[20] Bin Yang,et al. HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[21] Bo Li,et al. SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[22] Bin Yang,et al. Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[23] Horst-Michael Groß,et al. Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds , 2018, ECCV Workshops.

[24] Yuning Jiang,et al. Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[25] Bin Yang,et al. PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Raquel Urtasun,et al. Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Bin Yang,et al. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Ulrich Neumann,et al. Recurrent Slice Networks for 3D Segmentation of Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Marcelo H. Ang,et al. A General Pipeline for 3D Detection of Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30] Steven Lake Waslander,et al. Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Danfei Xu,et al. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Laurens van der Maaten,et al. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Ulrich Neumann,et al. SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Yin Zhou,et al. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Luc Van Gool,et al. Semantic Instance Segmentation with a Discriminative Loss Function , 2017, ArXiv.

[38] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[39] Laurens van der Maaten,et al. Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[40] Thierry Chateau,et al. Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Jana Kosecka,et al. 3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Min Bai,et al. Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Ji Wan,et al. Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Sanja Fidler,et al. Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[50] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Huimin Ma,et al. 3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[53] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[55] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[56] Silvio Savarese,et al. A coarse-to-fine model for 3D pose estimation and sub-category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Kostas Daniilidis,et al. Single image 3D object detection and pose estimation for grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[58] Antonio Torralba,et al. FPM: Fine Pose Parts-Based Model with 3D CAD Models , 2014, ECCV.

[59] Jianxiong Xiao,et al. Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[60] Sven J. Dickinson,et al. 3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[61] Peter V. Gehler,et al. 3D2PM - 3D Deformable Part Models , 2012, ECCV.

[62] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.