Understanding Traffic Density from Large-Scale Web Camera Data

Understanding traffic density from large-scale web camera (webcam) videos is a challenging problem because such videos have low spatial and temporal resolution, high occlusion and large perspective. To deeply understand traffic density, we explore both optimization based and deep learning based methods. To avoid individual vehicle detection or tracking, both methods map the dense image feature into vehicle density, one based on rank constrained regression and the other based on fully convolutional networks (FCN). The regression based method learns different weights for different blocks of the image to embed road geometry and significantly reduce the error induced by camera perspective. The FCN based method jointly estimates vehicle density and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and datasets. FCN based method significantly reduces the mean absolute error (MAE) from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.

[1]  Odemir Martinez Bruno,et al.  Spatiotemporal Gabor filters: a new method for dynamic texture recognition , 2012, ArXiv.

[2]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Mun Wai Lee,et al.  Traffic analysis with low frame rate camera networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[4]  L. Craig Davis,et al.  Introduction to Modern Traffic Flow Theory and Control: The Long Road to Three-Phase Traffic Theory , 2009 .

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  T. Aaron Gulliver,et al.  Store-Carry-Forward Message Dissemination in Vehicular Ad-Hoc Networks with Local Density Estimation , 2009, 2009 IEEE 70th Vehicular Technology Conference Fall.

[7]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[8]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  I. Daubechies,et al.  Accelerated Projected Gradient Method for Linear Inverse Problems with Sparsity Constraints , 2007, 0706.4297.

[10]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[12]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[13]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[14]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Sanyuan Zhang,et al.  Vehicles detection in Traffic Flow , 2010, 2010 Sixth International Conference on Natural Computation.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Zezhi Chen,et al.  Vehicle detection, tracking and classification in urban traffic , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[19]  Saturnino Maldonado-Bascón,et al.  Extremely Overlapping Vehicle Counting , 2015, IbPRIA.

[20]  Thambipillai Srikanthan,et al.  Real-time road traffic density estimation using block variance , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bing-Fei Wu,et al.  A Real-Time Vision System for Nighttime Vehicle Detection and Traffic Surveillance , 2011, IEEE Transactions on Industrial Electronics.

[23]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  José M. F. Moura,et al.  Traffic flow from a low frame rate city camera , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[25]  Svetha Venkatesh,et al.  Face Recognition Using Kernel Ridge Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Silong Peng,et al.  Model based vehicle localization for urban traffic surveillance using image gradient based matching , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Qi Tian,et al.  Highway traffic information extraction from Skycam MPEG video , 2002, Proceedings. The IEEE 5th International Conference on Intelligent Transportation Systems.

[30]  Xiaogang Wang,et al.  Crossing-Line Crowd Counting with Two-Phase Deep Neural Networks , 2016, ECCV.

[31]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Andrew Zisserman,et al.  Counting in the Wild , 2016, ECCV.

[34]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Shaogang Gong,et al.  Cumulative Attribute Space for Age and Crowd Density Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[37]  Huan Li,et al.  Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[38]  Yik-Chung Wu,et al.  Network-Wide Distributed Carrier Frequency Offsets Estimation and Compensation via Belief Propagation , 2013, IEEE Transactions on Signal Processing.

[39]  Euhanna Ghadimi,et al.  Optimal Parameter Selection for the Alternating Direction Method of Multipliers (ADMM): Quadratic Problems , 2013, IEEE Transactions on Automatic Control.