DVFENet: Dual-branch voxel feature extraction network for 3D object detection

Abstract 3D object detection based on LiDAR point cloud has wide applications in autonomous driving and robotics. Recently, many approaches use voxelization representation in feature extraction and apply 3D convolution neural networks for 3D object detection. How to get expressive 3D voxelization representation is important for the detection performance. Therefore, we propose a new 3D object detection framework (DVFENet) based on dual-branch voxel feature extraction, which can provide rich and complete 3D information. The first branch is a graph-attention-network-based voxel feature extraction, which applies an improved voxel graph attention feature extractor (VGAFE) on large-scale voxelization. This branch uses graph convolution networks with an attention mechanism to extract more local neighborhood and context information. The second branch is a 3D-sparse-convolution-based voxel feature extraction that captures finer geometric features based on small-scale voxelization. We also design a decoupled RPN module that can obtain task-specific features to reduce the task conflict. Experiments on the challenging KITTI 3D object detection benchmark and nuScenes detection task show that our method achieve good performance. At the same time, we conduct extensive experiments to verify the effectiveness of each component.

[1]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Xiaogang Wang,et al.  From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Dushyant Rao,et al.  Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Baoquan Chen,et al.  PointCNN: Convolution On $\mathcal{X}$-Transformed Points , 2018 .

[5]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Bernard Ghanem,et al.  PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement , 2019, ArXiv.

[7]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Henry Leung,et al.  Overview of Environment Perception for Intelligent Vehicles , 2017, IEEE Transactions on Intelligent Transportation Systems.

[9]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[10]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Horst-Michael Groß,et al.  Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds , 2018, ECCV Workshops.

[12]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Xiaoli Hao,et al.  SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection , 2020, Neurocomputing.

[14]  Lu Yuan,et al.  Rethinking Classification and Localization for Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Daniel Cohen-Or,et al.  ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning , 2018, ACM Trans. Graph..

[16]  Xiaoyong Shen,et al.  STD: Sparse-to-Dense 3D Object Detector for Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[19]  Jiaya Jia,et al.  Fast Point R-CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jianping An,et al.  Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds , 2020, Sensors.

[22]  Bin Yang,et al.  Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Sebastian Thrun,et al.  Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[25]  Bo Li,et al.  3D fully convolutional network for vehicle detection in point cloud , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Zhixin Wang,et al.  Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[28]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Shuangjie Xu,et al.  HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[32]  Marcelo H. Ang,et al.  A General Pipeline for 3D Detection of Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Hui Zhou,et al.  SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Xin Zhao,et al.  TANet: Robust 3D Object Detection from Point Clouds with Triple Attention , 2019, AAAI.

[35]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Fernando García,et al.  BirdNet: A 3D Object Detection Framework from LiDAR Information , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[39]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Jinwhan Kim,et al.  Precise Localization and Mapping in Indoor Parking Structures via Parameterized SLAM , 2019, IEEE Transactions on Intelligent Transportation Systems.

[41]  Bin Yang,et al.  SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Tong Boon Tang,et al.  Vehicle Detection Techniques for Collision Avoidance Systems: A Review , 2015, IEEE Transactions on Intelligent Transportation Systems.

[43]  Guanglu Song,et al.  Revisiting the Sibling Head in Object Detector , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ruigang Yang,et al.  IoU Loss for 2D/3D Object Detection , 2019, 2019 International Conference on 3D Vision (3DV).

[45]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[46]  Shu Liu,et al.  IPOD: Intensive Point-based Object Detector for Point Cloud , 2018, ArXiv.