VANETs Meet Autonomous Vehicles: A Multimodal 3D Environment Learning Approach

In this paper, we design a multimodal framework for object detection, recognition and mapping based on the fusion of stereo camera frames, point cloud Velodyne LIDAR scans, and Vehicle-to-Vehicle (V2V) Basic Safety Messages (BSMs) that are exchanged using Dedicated Short Range Communication (DSRC). We merge the key features of rich texture descriptions of objects from 2D images using Convolutional Neural Networks (CNN). In addition, depth and distance between objects are provided by the 3D LIDAR point cloud and the awareness of hidden vehicles is achieved from BSMs' beacons. We present a joint pixel to point cloud and pixel to V2V correspondence of objects in frames of driving sequences in the KITTI Vision Benchmark Suite. We achieve this by using a semi-supervised manifold alignment approach to achieve camera-LIDAR and camera-V2V mapping of their recognized persons and cars that have the same underlying manifold.

[1]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[2]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Mohsen Guizani,et al.  Advanced Activity-Aware Multi-Channel Operations1609.4 in VANETs for Vehicular Clouds , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[7]  Richard S. Zemel,et al.  Learning Hybrid Models for Image Annotation with Partially Labeled Data , 2008, NIPS.

[8]  Daniel D. Lee,et al.  Semisupervised alignment of manifolds , 2005, AISTATS.

[9]  Reza Bosagh Zadeh,et al.  FusionNet: 3D Object Classification Using Multiple Data Representations , 2016, ArXiv.

[10]  Andreas Meier,et al.  Design of 5.9 ghz dsrc-based vehicular safety communication , 2006, IEEE Wireless Communications.

[11]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[12]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[13]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Daphne Koller,et al.  Efficiently selecting regions for scene understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Dushyant Rao,et al.  Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).