Multimodal Deep-Learning for Object Recognition Combining Camera and LIDAR Data

Object detection and recognition is a key component of autonomous robotic vehicles, as evidenced by the continuous efforts made by the robotic community on areas related to object detection and sensory perception systems. This paper presents a study on multisensor (camera and LIDAR) late fusion strategies for object recognition. In this work, LIDAR data is processed as 3D points and also by means of a 2D representation in the form of depth map (DM), which is obtained by projecting the LIDAR 3D point cloud into a 2D image plane followed by an upsampling strategy which generates a high-resolution 2D range view. A CNN network (Inception V3) is used as classification method on the RGB images, and on the DMs (LIDAR modality). A 3D-network (the PointNet), which directly performs classification on the 3D point clouds, is also considered in the experiments. One of the motivations of this work consists of incorporating the distance to the objects, as measured by the LIDAR, as a relevant cue to improve the classification performance. A new range-based average weighting strategy is proposed, which considers the relationship between the deep-models’ performance and the distance of objects. A classification dataset, based on the KITTI database, is used to evaluate the deep-models, and to support the experimental part. We report extensive results in terms of single modality i.e., using RGB and LIDAR models individually, and late fusion multimodality approaches.

[1]  Wilfried Philips,et al.  Deep Learning Fusion of RGB and Depth Images for Pedestrian Detection , 2019, BMVC.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Alexandrina Rogozan,et al.  An Evaluation of the Pedestrian Classification in a Multi-Domain Multi-Modality Setup , 2015, Sensors.

[4]  Hui Kong,et al.  Histograms of the Normalized Inverse Depth and Line Scanning for Urban Road Detection , 2019, IEEE Transactions on Intelligent Transportation Systems.

[5]  Stefan Holban,et al.  A genetic algorithm for classification , 2011 .

[6]  Danwei Wang,et al.  3D feature points detection on sparse and non-uniform pointcloud for SLAM , 2017, 2017 18th International Conference on Advanced Robotics (ICAR).

[7]  Zsolt Kira,et al.  Fusing LIDAR and images for pedestrian detection using convolutional neural networks , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dong Liu,et al.  Sample-Specific Late Fusion for Visual Category Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Leonidas J. Guibas,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wei Jiang,et al.  A late fusion approach for harnessing multi-cnn model high-level features , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Tomasz Trzcinski,et al.  Late fusion of deep learning and hand-crafted features for Achilles tendon healing monitoring , 2019, ArXiv.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Anath Fischer,et al.  3DmFV: Three-Dimensional Point Cloud Classification in Real-Time Using Convolutional Neural Networks , 2018, IEEE Robotics and Automation Letters.

[17]  Stéphane Ayache,et al.  Majority Vote of Diverse Classifiers for Late Fusion , 2014, S+SSPR.

[18]  C. Millet,et al.  Object / Background Scene Joint Classification in Photographs Using Linguistic Statistics from the Web , 2008 .

[19]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cristiano Premebida,et al.  High-resolution LIDAR-based depth mapping using bilateral filter , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[21]  Ni-Bin Chang,et al.  Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing , 2018 .

[22]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[23]  Chong Xiang,et al.  Generating 3D Adversarial Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Cristiano Premebida,et al.  CNN-LIDAR pedestrian classification: combining range and reflectance data , 2018, 2018 IEEE International Conference on Vehicular Electronics and Safety (ICVES).

[25]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Saleh Aly,et al.  Partially occluded pedestrian classification using histogram of oriented gradients and local weighted linear kernel support vector machine , 2014, IET Comput. Vis..

[27]  Cristiano Premebida,et al.  Multimodal CNN Pedestrian Classification: A Study on Combining LIDAR and Camera Data , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[28]  Shenhua Hou,et al.  L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Fawzi Nashashibi,et al.  Incremental Cross-Modality deep learning for pedestrian recognition , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[30]  Chen Wang,et al.  Non-iterative SLAM , 2017, 2017 18th International Conference on Advanced Robotics (ICAR).

[31]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[34]  Aseem Behl,et al.  PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[36]  Derya Birant,et al.  An incremental genetic algorithm for classification and sensitivity analysis of its parameters , 2011, Expert Syst. Appl..

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Aizhong Mi,et al.  A Multiple Classifier Fusion Algorithm Using Weighted Decision Templates , 2016, Sci. Program..

[39]  Yannis Avrithis,et al.  Dense Classification and Implanting for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..