Radar Voxel Fusion for 3D Object Detection

Automotive traffic scenes are complex due to the variety of possible scenarios, objects, and weather conditions that need to be handled. In contrast to more constrained environments, such as automated underground trains, automotive perception systems cannot be tailored to a narrow field of specific tasks but must handle an ever-changing environment with unforeseen events. As currently no single sensor is able to reliably perceive all relevant activity in the surroundings, sensor data fusion is applied to perceive as much information as possible. Data fusion of different sensors and sensor modalities on a low abstraction level enables the compensation of sensor weaknesses and misdetections among the sensors before the information-rich sensor data are compressed and thereby information is lost after a sensor-individual object detection. This paper develops a low-level sensor fusion network for 3D object detection, which fuses lidar, camera, and radar data. The fusion network is trained and evaluated on the nuScenes data set. On the test set, fusion of radar data increases the resulting AP (Average Precision) detection score by about 5.1 % in comparison to the baseline lidar network. The radar sensor fusion proves especially beneficial in inclement conditions such as rain and night scenes. Fusing additional camera data contributes positively only in conjunction with the radar fusion, which shows that interdependencies of the sensors are important for the detection result. Additionally, the paper proposes a novel loss to handle the discontinuity of a simple yaw representation for object detection. Our updated loss increases the detection and orientation estimation performance for all sensor input configurations. The code for this research has been made available on GitHub.

[1]  Michael Arens,et al.  Mitigation of crosstalk effects in multi-LiDAR configurations , 2018, Security + Defence.

[2]  Nils Appenrodt,et al.  Off-the-shelf sensor vs. experimental radar - How much resolution is necessary in automotive radar classification? , 2020, 2020 IEEE 23rd International Conference on Information Fusion (FUSION).

[3]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[4]  Laurens van der Maaten,et al.  Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[5]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[7]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Hairong Qi,et al.  CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection , 2020, ArXiv.

[9]  Yang Tang,et al.  Monocular depth estimation based on deep learning: An overview , 2020, Science China Technological Sciences.

[10]  Bastian Goldluecke,et al.  High Dimensional Frustum PointNet for 3D Object Detection from Camera, LiDAR, and Radar , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[11]  Xiang Jun Zou,et al.  A Research of Stereo Vision Positioning under Vibration , 2010 .

[12]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[13]  Xiaogang Wang,et al.  PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[16]  Sergio Casas,et al.  RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects , 2020, ECCV.

[17]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Steven Lake Waslander,et al.  In Defense of Classical Image Processing: Fast Depth Completion on the CPU , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[20]  Philipp Krähenbühl,et al.  Center-based 3D Object Detection and Tracking , 2020, ArXiv.

[21]  Yan Wang,et al.  Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving , 2019, ICLR.

[22]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Teck-Yian Lim,et al.  Radar and Camera Early Fusion for Vehicle Detection in Advanced Driver Assistance Systems , 2019 .

[25]  Markus Lienkamp,et al.  Kernel Point Convolution LSTM Networks for Radar Point Cloud Segmentation , 2021, Applied Sciences.

[26]  Paul Newman,et al.  Distant Vehicle Detection Using Radar and Vision , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[27]  Stefan Bracke,et al.  YOdar: Uncertainty-based Sensor Fusion for Vehicle Detection with Camera and Radar Sensors , 2021, ICAART.

[28]  Ioannis Pitas,et al.  Computer Stereo Vision for Autonomous Driving , 2020, ArXiv.

[29]  Jinhyeong Kim,et al.  Low-level Sensor Fusion Network for 3D Vehicle Detection using Radar Range-Azimuth Heatmap and Monocular Image , 2020 .

[30]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andy Stove,et al.  Low-THz Radar, Lidar and Optical Imaging through Artificially Generated Fog , 2017 .

[32]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[33]  Yongwan Park,et al.  Investigation on the occurrence of mutual interference between pulsed terrestrial LIDAR scanners , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[34]  J. Choi,et al.  GRIF Net: Gated Region of Interest Fusion Network for Robust 3D Object Detection from Radar Point Cloud and Monocular Image , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[37]  Markus Lienkamp,et al.  Exploring the Capabilities and Limits of 3D Monocular Object Detection - A Study on Simulation and Real World Data , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[38]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Oscar Beijbom,et al.  PointPainting: Sequential Fusion for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Markus Lienkamp,et al.  A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection , 2019, 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF).

[41]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Ke Yan,et al.  Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks , 2019, Scientific Reports.

[43]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.