V3I-STAL: Visual Vehicle-to-Vehicle Interaction via Simultaneous Tracking and Localization

This paper investigates a visual interaction system for vehicle-to-vehicle (V2V) platform, called V3I. Our system employs common visual cameras that are mounted on connected vehicles to perceive the existence of isolated vehicles in the same roadway, and provides human drivers with imagery situational awareness. This allows effective interactions between vehicles even with a low permeation rate of V2V devices. The underlying research problem for V3I includes two aspects: i) tracking isolated vehicles of interest over time through local cameras; ii) at each time-step fusing the results of local visual perceptions to obtain a global location map that involves both isolated and connected vehicles. In this paper, we introduce a unified probabilistic approach to solve the above two problems, i.e., tracking and localization, in a joint fashion. Our approach will explore both the visual features of individual vehicles in images and the pair-wise spatial relationships between vehicles. We develop a fast Markov Chain Monte Carlo (MCMC) algorithm to search the joint solution space efficiently, which enables real-time application. To evaluate the performance of the proposed approach, we collect and annotate a set of video sequences captured with a group of vehicle-resident cameras. Extensive experiments with comparisons clearly demonstrate that the proposed V3I approach can precisely recover the dynamic location map of the surrounding and thus enable direct visual interactions between vehicles .

[1]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[2]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[4]  Marc Pollefeys,et al.  Class Specific 3D Object Shape Priors Using Surface Normals , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Xin Zhang,et al.  Object class detection: A survey , 2013, CSUR.

[6]  Michèle Audin,et al.  Hamiltonian systems and their integrability , 2008 .

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Charles E. Thorpe,et al.  Simultaneous localization and mapping with detection and tracking of moving objects , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[12]  Shiyu Song,et al.  Robust Scale Estimation in Real-Time Monocular SFM for Autonomous Driving , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Mohamed Aly,et al.  Real time detection of lane markers in urban streets , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[16]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[17]  Liang Lin,et al.  Contextualized trajectory parsing with spatiotemporal graph. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  S. Shankar Sastry,et al.  Markov Chain Monte Carlo Data Association for Multi-Target Tracking , 2009, IEEE Transactions on Automatic Control.

[20]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[21]  J. A. Roecker A class of near optimal JPDA algorithms , 1994 .

[22]  J. A. Roecker,et al.  Suboptimal joint probabilistic data association , 1993 .

[23]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Andreas Geiger,et al.  Understanding High-Level Semantics by Modeling Traffic Patterns , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Nitin H. Vaidya,et al.  A vehicle-to-vehicle communication protocol for cooperative collision warning , 2004, The First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004..

[28]  Song-Chun Zhu,et al.  Single-View 3D Scene Parsing by Attributed Grammar , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[30]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[31]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[32]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[33]  Deva Ramanan,et al.  Analysis by Synthesis: 3D Object Recognition by Object Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  O. D. Almeida,et al.  Hamiltonian Systems: Chaos and Quantization , 1990 .

[35]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[36]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[37]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[38]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[39]  Adrian Barbu,et al.  Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Song-Chun Zhu,et al.  From Information Scaling of Natural Images to Regimes of Statistical Models , 2007 .

[41]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  M. Opper Sparse Online Gaussian Processes , 2008 .

[43]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Mubarak Shah,et al.  A Multiview Approach to Tracking People in Crowded Scenes Using a Planar Homography Constraint , 2006, ECCV.

[45]  Zoubin Ghahramani,et al.  Variable Noise and Dimensionality Reduction for Sparse Gaussian processes , 2006, UAI.

[46]  Theodore L. Willke,et al.  A survey of inter-vehicle communication protocols and their applications , 2009, IEEE Communications Surveys & Tutorials.

[47]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[48]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[49]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[50]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Paolo Pirjanian,et al.  The vSLAM Algorithm for Robust Localization and Mapping , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[52]  Trevor Darrell,et al.  Gaussian Processes for Object Categorization , 2010, International Journal of Computer Vision.

[53]  Xin Yu,et al.  Object Tracking With Multi-View Support Vector Machines , 2015, IEEE Transactions on Multimedia.

[54]  Sanja Fidler,et al.  Box in the Box: Joint 3D Layout and Object Reasoning from Single Images , 2013, 2013 IEEE International Conference on Computer Vision.

[55]  Andreas Geiger,et al.  FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Ramakant Nevatia,et al.  Multiple Target Tracking by Learning-Based Hierarchical Association of Detection Responses , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Gérard G. Medioni,et al.  Multiple Target Tracking Using Spatio-Temporal Markov Chain Monte Carlo Data Association , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.