论文信息 - Feature Map Transformation for Multi-sensor Fusion in Object Detection Networks for Autonomous Driving

Feature Map Transformation for Multi-sensor Fusion in Object Detection Networks for Autonomous Driving

We present a general framework for fusing pre-trained object detection networks for multiple sensor modalities in autonomous cars at an intermediate stage. The key innovation is an autoencoder-inspired Transformer module which transforms perspective as well as feature activation characteristics from one sensor modality to another. Transformed feature maps can be combined with those of a modality-native feature extractor to enhance performance and reliability through a simple fusion scheme. Our approach is not limited to specific object detection network types. Compared to other methods, our framework allows fusion of pre-trained object detection networks and fuses sensor modalities at a single stage, resulting in a modular and traceable architecture. We show effectiveness of the proposed scheme by fusing camera and Lidar information to detect objects using our own as well as the KITTI dataset.

[1] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Yin Zhou,et al. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Wei Zhan,et al. Fusing Bird View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection , 2017, ArXiv.

[6] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[7] Steven Lake Waslander,et al. Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Ji Wan,et al. Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[12] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.