Towards Raw Sensor Fusion in 3D Object Detection

The paper focuses on the problem of raw data fusion in neural networks based 3D object detection architectures. Here we consider the case of autonomous driving with data from camera and LiDAR sensors. Understanding the vehicle surroundings is a crucial task in autonomous driving since any subsequent action taken is strongly dependent on it. In this paper we present an alternative method of fusing camera image information with LiDAR poinclouds at a close to raw level of abstraction. Our results suggest that our approach improves the average precision of 3D bounding box detection of cyclists (and possibly other objects) in sparse point clouds compared to the baseline architecture without low-level fusion. The proposed approach has been evaluated on the KITTI dataset containing driving scenes with corresponding camera and LiDAR data. The long-term goal of our research is to develop a neural network architecture for environment perception that fuses multi-sensor data at the earliest stages possible, thus leveraging the full benefits of possible inter-sensor synergies.

[1]  M Berke,et al.  The Issues , 2002, Hospitals.

[2]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kunsoo Huh,et al.  Sensor Fusion Algorithm Design in Detecting Vehicles Using Laser Scanner and Stereo Vision , 2016, IEEE Transactions on Intelligent Transportation Systems.

[4]  Phil Williams North Atlantic Treaty Organization , 1994 .

[5]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[6]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[7]  David J. Morley,et al.  Sensor data fusion on a parallel processor , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[8]  James Llinas,et al.  An introduction to multisensor data fusion , 1997, Proc. IEEE.

[9]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  James S. Albus,et al.  Concepts for a Real-Time Sensory-Interactive Control System Architecture , 1982 .

[11]  H. F. Durrant-White Consistent integration and propagation of disparate sensor observations , 1987 .

[12]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Hugh F. Durrant-Whyte,et al.  Consistent Integration and Propagation of Disparate Sensor Observations , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[15]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  W. Marsden I and J , 2012 .