Scene Invariant Virtual Gates Using DNNs

Understanding where people are located and how they are moving about in an environment is critical for operators of large public spaces such as shopping centers, and large public infrastructures such as airports. Automated analysis of CCTV footage is increasingly being used to address this need through techniques that can count crowd sizes, estimate their density, and estimate the through-put of people into and/or out of a choke-point. A limitation of using CCTV based approaches, however, is the need to train models specific to each view which, for large environments with 100s or 1000s of cameras, can quickly become problematic. While there is some success in developing scene-invariant crowd counting and crowd density estimation approaches, much less attention has been given to developing scene-invariant solutions for through-put estimation. In this paper, we investigate the use of convolutional neural network and long short-term memory architectures to estimate pedestrian through-put from arbitrary CCTV viewpoints. To properly develop and demonstrate our approach, we present a new 22 view database featuring 44 h of pedestrian throughput annotation, containing over 11 000 annotated people; and using this proposed approach we show that we are able to outperform a scene-dependant approach across a diverse set of challenging view-points.

[1]  Kaiqi Huang,et al.  Large scale crowd analysis based on convolutional neural network , 2015, Pattern Recognit..

[2]  Kenji Terada,et al.  A method of counting the passing people by using the stereo images , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[3]  Nuno Vasconcelos,et al.  Bayesian Model Adaptation for Crowd Counts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Antoni B. Chan,et al.  Crossing the Line: Crowd Counting by Integer Programming with Local Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[7]  Jae-Jun Kim,et al.  A Method of Counting Pedestrians in Crowded Scenes , 2008, ICIC.

[8]  Fernando Boto,et al.  Real-Time People Counting Using Multiple Lines , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[9]  Xiaogang Wang,et al.  Crossing-Line Crowd Counting with Two-Phase Deep Neural Networks , 2016, ECCV.

[10]  Satarupa Mukherjee,et al.  Unique people count from monocular videos , 2014, The Visual Computer.

[11]  Sridha Sridharan,et al.  Scene invariant multi camera crowd counting , 2014, Pattern Recognit. Lett..

[12]  Qi Wang,et al.  Deep Metric Learning for Crowdedness Regression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Tsong-Yi Chen,et al.  An Intelligent People-Flow Counting Method for Passing Through a Gate , 2006, 2006 IEEE Conference on Robotics, Automation and Mechatronics.

[15]  Sung-Jea Ko,et al.  Real-time Vision-based People Counting System for the Security Door , 2002 .

[16]  Sridha Sridharan,et al.  Large scale monitoring of crowds and building utilisation: A new database and distributed approach , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[17]  Antonio Albiol,et al.  Statistical video analysis for crowds counting , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[18]  Xiaogang Wang,et al.  Fully Convolutional Neural Networks for Crowd Segmentation , 2014, ArXiv.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Thou-Ho Chen An automatic bi-directional passing-people counting method based on color image processing , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[23]  Senem Velipasalar,et al.  Automatic Counting of Interacting People by using a Single Uncalibrated Camera , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[24]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.