An end-to-end generative adversarial network for crowd counting under complicated scenes

Crowd counting and analyzing its distribution is a challenging video surveillance application. In this paper, a totally novel, end-to-end way to estimate the crowd number under complicated scenes is proposed. For the purpose, we apply the conditional adversarial networks to translate the input image to its density map. The conditional generative adversarial model is trained with input image and its corresponding density image. The proposed method avoid the design of complex CNN architecture to extract specific property features. Besides, no more data augmentation is needed in our method. Evaluated on the dataset of Shanghaitech which consists of two challenge parts, our methods shows convincing counting results with high quality estimated density images. Moreover, our experiments can been done in an efficient and labor saving way.

[1]  Shaogang Gong,et al.  Crowd Counting and Profiling: Methodology and Evaluation , 2013, Modeling, Simulation and Visual Analysis of Crowds.

[2]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[3]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Jean-Luc Dugelay,et al.  People counting system in crowded scenes based on feature regression , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[5]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Grantham Pang,et al.  People Counting and Human Detection in a Challenging Situation , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[7]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sridha Sridharan,et al.  An evaluation of crowd counting methods, features and regression models , 2015, Comput. Vis. Image Underst..

[12]  Hua Yang,et al.  The large-scale crowd density estimation based on sparse spatiotemporal local binary pattern , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[13]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[14]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.