V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction

In this paper, we explore the use of vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles. By intelligently aggregating the information received from multiple nearby vehicles, we can observe the same scene from different viewpoints. This allows us to see through occlusions and detect actors at long range, where the observations are very sparse or non-existent. We also show that our approach of sending compressed deep feature map activations achieves high accuracy while satisfying communication bandwidth requirements.

[1]  Andrew J. Davison,et al.  Mobile Robot Navigation Using Active Vision , 1998 .

[2]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[5]  Klaus C. J. Dietmayer,et al.  Car2X-based perception in a high-level fusion architecture for cooperative perception systems , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[6]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[7]  Raquel Urtasun,et al.  Learning to Localize Through Compressed Binary Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ivan V. Bajic,et al.  High Efficiency Compression for Object Detection , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[11]  Tobias Roth,et al.  DSRC and radar object matching for cooperative driver assistance systems , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[12]  Renjie Liao,et al.  Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction , 2019, CoRL.

[13]  John B. Kenney,et al.  Dedicated Short-Range Communications (DSRC) Standards in the United States , 2011, Proceedings of the IEEE.

[14]  Matthias Kranz,et al.  V2V Communications in Automotive Multi-Sensor Multi-Target Tracking , 2008, 2008 IEEE 68th Vehicular Technology Conference.

[15]  Sanja Fidler,et al.  Situation Recognition with Graph Neural Networks , 2018 .

[16]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[19]  Ting Yuan,et al.  Object Matching for Inter-Vehicle Communication Systems—An IMM-Based Track Association Approach With Sequential Multiple Hypothesis Test , 2017, IEEE Transactions on Intelligent Transportation Systems.

[20]  Shengyong Chen,et al.  Active vision in robotic systems: A survey of recent developments , 2011, Int. J. Robotics Res..

[21]  Zaydoun Yahya Rawashdeh,et al.  Collaborative Automated Driving: A Machine Learning-based Method to Enhance the Accuracy of Shared Information , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[22]  Mohsen Guizani,et al.  VANETs Meet Autonomous Vehicles: A Multimodal 3D Environment Learning Approach , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[23]  Raquel Urtasun,et al.  Multi-Agent Routing Value Iteration Network , 2020, ICML.

[24]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[25]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Sergey Levine,et al.  PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  David Bradley,et al.  Deep Kinematic Models for Physically Realistic Prediction of Vehicle Trajectories , 2019, ArXiv.

[28]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[29]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[30]  Kun Jiang,et al.  Multimedia Fusion at Semantic Level in Vehicle Cooperactive Perception , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[31]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[32]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Qing Yang,et al.  Cooper: Cooperative Perception for Connected Autonomous Vehicles Based on 3D Point Clouds , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[34]  Raquel Urtasun,et al.  End-to-end Contextual Perception and Prediction with Interaction Transformer , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Bin Yang,et al.  Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Emilio Frazzoli,et al.  Multivehicle Cooperative Driving Using Cooperative Perception: Design and Experimental Validation , 2015, IEEE Transactions on Intelligent Transportation Systems.

[37]  Sergio Casas,et al.  Implicit Latent Variable Model for Scene-Consistent Motion Forecasting , 2020, ECCV.

[38]  Ryan M. Eustice,et al.  Active visual SLAM for robotic area coverage: Theory and experiment , 2015, Int. J. Robotics Res..

[39]  Kristen Grauman,et al.  Look-Ahead Before You Leap: End-to-End Active Recognition by Forecasting the Effect of Motion , 2016, ECCV.

[40]  Renjie Liao,et al.  SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[42]  Zhanxing Zhu,et al.  Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[43]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  David W. Murray,et al.  Simultaneous Localization and Map-Building Using Active Vision , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Raquel Urtasun,et al.  LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.