Enhancing Grid-Based 3D Object Detection in Autonomous Driving With Improved Dimensionality Reduction

Point cloud object detection is a pivotal technology in autonomous driving and robotics. Currently, the majority of cutting-edge point cloud detectors utilize Bird’s Eye View (BEV) for detection, as it allows them to take advantage of well-explored 2D detection techniques. Nevertheless, dimensionality reduction of features from 3D space to BEV space unavoidably leads to information loss, and there is a lack of research on this issue. Existing methods typically obtain BEV features by collapsing voxel or point features along the height dimension via a pooling operation or convolution, resulting in a significant decrease in geometric information. To tackle this problem, we present a new point cloud backbone network for grid-based object detection, MDRNet, which is based on adaptive dimensionality reduction and multi-level spatial residual strategies. In MDRNet, the Spatial-aware Dimensionality Reduction (SDR) is designed to dynamically concentrate on the essential components of the object during 3D-to-BEV transformation. Moreover, the Multi-level Spatial Residuals (MSR) strategy is proposed to effectively fuse multi-level spatial information in BEV feature maps. Our MDRNet can be employed on any existing grid-based object detector, resulting in a remarkable improvement in performance. Numerous experiments conducted on nuScenes, KITTI and DAIR-V have shown that MDRNet surpasses existing SOTA approaches. In particular, on the nuScenes dataset, we attained an impressive 7.2% mAP and 5.0% NDS enhancement compared with CenterPoint.

[1]  Simegnew Yihunie Alaba,et al.  WCNN3D: Wavelet Convolutional Neural Network-Based 3D Object Detection for Autonomous Driving , 2022, Sensors.

[2]  Jiaya Jia,et al.  LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kaicheng Yu,et al.  BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework , 2022, NeurIPS.

[4]  Huizi Mao,et al.  BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Ruifeng Li,et al.  PillarNet: Real-Time and High-Performance Pillar-based 3D Object Detection , 2022, ECCV.

[6]  Jiaya Jia,et al.  Focal Sparse Convolutional Networks for 3D Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zaiqing Nie,et al.  DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  K. Jia,et al.  VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Zhiyuan Zhang,et al.  CVFNet: Real-time 3D Object Detection by Learning Cross View Features , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  M. Barth,et al.  PillarGrid: Deep Learning-Based Cooperative Perception for 3D Object Detection from Onboard-Roadside LiDAR , 2022, 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC).

[11]  Jonathan Li,et al.  CasA: A Cascade Attention Network for 3-D Object Detection From LiDAR Point Clouds , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Philipp Krähenbühl,et al.  Multimodal Virtual Point 3D Detection , 2021, NeurIPS.

[13]  Minzhe Niu,et al.  Voxel Transformer for 3D Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Dingfu Zhou,et al.  FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection , 2021, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).

[15]  Wengang Zhou,et al.  Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection , 2020, AAAI.

[16]  Philipp Krähenbühl,et al.  Center-based 3D Object Detection and Tracking , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zhuo Yang,et al.  MuRF-Net: Multi-Receptive Field Pillars for 3D Object Detection from Point Cloud , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[18]  Yue Wang,et al.  Pillar-based Object Detection for Autonomous Driving , 2020, ECCV.

[19]  Larry S. Davis,et al.  InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling , 2020, ECCV.

[20]  Jun Won Choi,et al.  3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection , 2020, ECCV.

[21]  Vladlen Koltun,et al.  Tracking Objects as Points , 2020, ECCV.

[22]  Yanan Sun,et al.  3DSSD: Point-Based 3D Single Stage Object Detector , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaogang Wang,et al.  PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  A. Yuille,et al.  Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots , 2019, ECCV.

[25]  Alex H. Lang,et al.  PointPainting: Sequential Fusion for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alan L. Yuille,et al.  Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization , 2020, NeurIPS.

[28]  Benjin Zhu,et al.  Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection , 2019, ArXiv.

[29]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[30]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[33]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[37]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[40]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..