Boosting Multi-Vehicle Tracking with a Joint Object Detection and Viewpoint Estimation Sensor

In this work, we address the problem of multi-vehicle detection and tracking for traffic monitoring applications. We preset a novel intelligent visual sensor for tracking-by-detection with simultaneous pose estimation. Essentially, we adapt an Extended Kalman Filter (EKF) to work not only with the detections of the vehicles but also with their estimated coarse viewpoints, directly obtained with the vision sensor. We show that enhancing the tracking with observations of the vehicle pose, results in a better estimation of the vehicles trajectories. For the simultaneous object detection and viewpoint estimation task, we present and evaluate two independent solutions. One is based on a fast GPU implementation of a Histogram of Oriented Gradients (HOG) detector with Support Vector Machines (SVMs). For the second, we adequately modify and train the Faster R-CNN deep learning model, in order to recover from it not only the object localization but also an estimation of its pose. Finally, we publicly release a challenging dataset, the GRAM Road Traffic Monitoring (GRAM-RTM), which has been especially designed for evaluating multi-vehicle tracking approaches within the context of traffic monitoring applications. It comprises more than 700 unique vehicles annotated across more than 40.300 frames of three videos. We expect the GRAM-RTM becomes a benchmark in vehicle detection and tracking, providing the computer vision and intelligent transportation systems communities with a standard set of images, annotations and evaluation procedures for multi-vehicle tracking. We present a thorough experimental evaluation of our approaches with the GRAM-RTM, which will be useful for establishing further comparisons. The results obtained confirm that the simultaneous integration of vehicle localizations and pose estimations as observations in an EKF, improves the tracking results.

[1]  Wen-Chung Chang,et al.  Online Boosting for Vehicle Detection , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[3]  Pan Pan,et al.  Regressed Importance Sampling on Manifolds for Efficient Object Tracking , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[4]  Luc Van Gool,et al.  Object Detection and Tracking for Autonomous Navigation in Dynamic Environments , 2010, Int. J. Robotics Res..

[5]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[6]  Mathieu Aubry,et al.  Crafting a multi-task CNN for viewpoint estimation , 2016, BMVC.

[7]  Wen Xu,et al.  Monitoring Traffic Information with a Developed Acceleration Sensing Node , 2017, Sensors.

[8]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Liang Xiao,et al.  Multi-Object Tracking with Correlation Filter for Autonomous Vehicle , 2018, Sensors.

[12]  Bastian Leibe,et al.  Efficient Use of Geometric Constraints for Sliding-Window Object Detection in Video , 2011, ICVS.

[13]  Ankush Mittal,et al.  Real-time moving object detection algorithm on high-resolution videos using GPUs , 2012, Journal of Real-Time Image Processing.

[14]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[15]  Stephen Cameron,et al.  Advanced Guided Vehicles: Aspects of the Oxford Agv Project , 1994 .

[16]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[18]  Pavel Zemcík,et al.  Real-time object detection on CUDA , 2010, Journal of Real-Time Image Processing.

[19]  Silvio Savarese,et al.  Deformable part models revisited: A performance evaluation for object category pose estimation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Peter V. Gehler,et al.  3D2PM - 3D Deformable Part Models , 2012, ECCV.

[22]  Frank Dellaert,et al.  Robust car tracking using Kalman filtering and Bayesian templates , 1998, Other Conferences.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Daniel Oñoro-Rubio,et al.  The challenge of simultaneous object detection and pose estimation: a comparative study , 2018, Image Vis. Comput..

[25]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[26]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[27]  Yuan F. Zheng,et al.  Stereo Visual Tracking Within Structured Environments for Measuring Vehicle Speed , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Xinkai Wu,et al.  A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images , 2016, Sensors.

[30]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[31]  Chao Wei,et al.  Efficient Traffic State Estimation for Large-Scale Urban Road Networks , 2013, IEEE Transactions on Intelligent Transportation Systems.

[32]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[33]  Mubarak Shah,et al.  A non-iterative greedy algorithm for multi-frame point correspondence , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  Jake K. Aggarwal,et al.  Robust Vehicle Detection for Tracking in Highway Surveillance Videos Using Unsupervised Learning , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[35]  Antonio Fernández-Caballero,et al.  Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation , 2013, IWINAC.

[36]  Regis Hoffman,et al.  Visual classification of coarse vehicle orientation using Histogram of Oriented Gradients features , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[37]  Lili Huang,et al.  Real-time multi-vehicle tracking based on feature detection and color probability model , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[38]  Hans-Hellmut Nagel,et al.  Model-based object tracking in monocular image sequences of road traffic scenes , 1993, International Journal of Computer 11263on.

[39]  Akihiro Takeuchi,et al.  On-Road Multivehicle Tracking Using Deformable Object Model and Particle Filter With Improved Likelihood Estimation , 2012, IEEE Transactions on Intelligent Transportation Systems.

[40]  Luc Van Gool,et al.  Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Yvonne Schuhmacher,et al.  Race Car Vehicle Dynamics , 2016 .

[42]  D. Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[43]  Vittorio Murino,et al.  Collaborative particle filters for group tracking , 2010, 2010 IEEE International Conference on Image Processing.

[44]  Jake K. Aggarwal,et al.  Real-Time Illegal Parking Detection in Outdoor Environments Using 1-D Transformation , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Mubarak Shah,et al.  Multiframe Many–Many Point Correspondence for Vehicle Tracking in High Density Wide Area Aerial Videos , 2013, International Journal of Computer Vision.

[46]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Y. Bar-Shalom Tracking and data association , 1988 .

[48]  Mubarak Shah,et al.  A noniterative greedy algorithm for multiframe point correspondence , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[50]  Dariu Gavrila,et al.  Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.

[51]  Carolina Redondo-Cabrera REDONDO-CABRERA ET AL.: CONTINUOUS POSE ESTIMATION WITH HOUGH FOREST 1 All together now: Simultaneous Object Detection and Continuous Pose Estimation using a Hough Forest with Probabilistic Locally Enhanced Voting , 2014 .

[52]  Pascal Perez,et al.  Edge-Computing Video Analytics for Real-Time Traffic Monitoring in a Smart City , 2019, Sensors.

[53]  Tao Lei,et al.  Robust Vehicle Detection in Aerial Images Based on Cascaded Convolutional Neural Networks , 2017, Sensors.

[54]  Fatih Murat Porikli,et al.  Achieving real-time object detection and tracking under extreme conditions , 2006, Journal of Real-Time Image Processing.

[55]  Adam Idzkowski,et al.  Practical Methods for Vehicle Speed Estimation Using a Microprocessor-Embedded System with AMR Sensors , 2018, Sensors.

[56]  Ian Reid,et al.  fastHOG – a real-time GPU implementation of HOG , 2011 .

[57]  Ming-Hsuan Yang,et al.  An experimental comparison of online object-tracking algorithms , 2011, Optical Engineering + Applications.

[58]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[59]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.