Vehicle Tracking Using Surveillance With Multimodal Data Fusion

Vehicle location prediction or vehicle tracking is a significant topic within connected vehicles. This task, however, is difficult if merely a single modal data is available, probably causing biases and impeding the accuracy. With the development of sensor networks in connected vehicles, multimodal data are becoming accessible. Therefore, we propose a framework for vehicle tracking with multimodal data fusion. Specifically, we fuse the results of two modalities, images and velocities, in our vehicle-tracking task. Images, being processed in the module of vehicle detection, provide visual information about the features of vehicles, whereas velocity estimation can further evaluate the possible locations of the target vehicles, which reduces the number of candidates being compared, decreasing the time consumption and computational cost. Our vehicle detection model is designed with a color-faster R-CNN, whose inputs are both the texture and color of the vehicles. Meanwhile, velocity estimation is achieved by the Kalman filter, which is a classical method for tracking. Finally, a multimodal data fusion method is applied to integrate these outcomes so that vehicle-tracking tasks can be achieved. Experimental results suggest the efficiency of our methods, which can track vehicles using a series of surveillance cameras in urban areas.

[1]  Abdulmotaleb El-Saddik,et al.  Toward Social Internet of Vehicles: Concept, Architecture, and Applications , 2015, IEEE Access.

[2]  Ching-Tang Fan,et al.  Heterogeneous Information Fusion and Visualization for a Large-Scale Intelligent Video Surveillance System , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ch. Ramesh Babu,et al.  Internet of Vehicles: From Intelligent Grid to Autonomous Cars and Vehicular Clouds , 2016 .

[5]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[6]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[9]  Milos N. Mladenovic,et al.  Self-organizing control framework for driverless vehicles , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[10]  José Ramón Gil-García,et al.  Understanding Smart Cities: An Integrative Framework , 2012, HICSS.

[11]  Patrick Pérez,et al.  Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.

[12]  Henry Leung,et al.  Data fusion in intelligent transportation systems: Progress and challenges - A survey , 2011, Inf. Fusion.

[13]  Vladimir Pavlovic,et al.  Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection , 2002, Object recognition supported by user interaction for service robots.

[14]  Carles Gomez,et al.  Wireless home automation networks: A survey of architectures and technologies , 2010, IEEE Communications Magazine.

[15]  Pascal Vasseur,et al.  Introduction to Multisensor Data Fusion , 2005, The Industrial Information Technology Handbook.

[16]  Ebroul Izquierdo,et al.  Multimodal Fusion in Surveillance Applications , 2014, Fusion in Computer Vision.

[17]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[18]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[19]  Stefan M. Rüger,et al.  Information-theoretic semantic multimedia indexing , 2007, CIVR '07.

[20]  Qing Wang,et al.  A Survey on Device-to-Device Communication in Cellular Networks , 2013, IEEE Communications Surveys & Tutorials.

[21]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[22]  Yue Zhang,et al.  Social vehicle swarms: a novel perspective on socially aware vehicular communication architecture , 2016, IEEE Wireless Communications.

[23]  Mel Siegel,et al.  Sensor data fusion for context-aware computing using dempster-shafer theory , 2004 .

[24]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[25]  Wesam A. Sakla,et al.  Deep Multi-modal Vehicle Detection in Aerial ISR Imagery , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[27]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Sanjay Srivastava,et al.  Multi-Modal Design of an Intelligent Transportation System , 2017, IEEE Transactions on Intelligent Transportation Systems.

[30]  Mohan M. Trivedi,et al.  Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis , 2013, IEEE Transactions on Intelligent Transportation Systems.

[31]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[32]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[33]  Sherali Zeadally,et al.  Intelligent Device-to-Device Communication in the Internet of Things , 2016, IEEE Systems Journal.

[34]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jian Zhang,et al.  Deep learning for robust outdoor vehicle visual tracking , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[36]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[37]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[39]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[40]  Shiuh-Ku Weng,et al.  Video object tracking using adaptive Kalman filter , 2006, J. Vis. Commun. Image Represent..

[41]  Bakkama Srinath Reddy,et al.  Evidential Reasoning for Multimodal Fusion in Human Computer Interaction , 2007 .

[42]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[44]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Muhammad Asif,et al.  Automatic Number Plate Recognition System for Vehicle Identification Using Optical Character Recognition , 2009, 2009 International Conference on Education Technology and Computer.

[46]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[48]  Kian-Lee Tan,et al.  Authenticating query results in edge computing , 2004, Proceedings. 20th International Conference on Data Engineering.