Crowd counting via Multi-Scale Adversarial Convolutional Neural Networks

Abstract The purpose of crowd counting is to estimate the number of pedestrians in crowd images. Crowd counting or density estimation is an extremely challenging task in computer vision, due to large scale variations and dense scene. Current methods solve these issues by compounding multi-scale Convolutional Neural Network with different receptive fields. In this paper, a novel end-to-end architecture based on Multi-Scale Adversarial Convolutional Neural Network (MSA-CNN) is proposed to generate crowd density and estimate the amount of crowd. Firstly, a multi-scale network is used to extract the globally relevant features in the crowd image, and then fractionally-strided convolutional layers are designed for up-sampling the output to recover the loss of crucial details caused by the earlier max pooling layers. An adversarial loss is directly employed to shrink the estimated value into the realistic subspace to reduce the blurring effect of density estimation. Joint training is performed in an end-to-end fashion using a combination of Adversarial loss and Euclidean loss. The two losses are integrated via a joint training scheme to improve density estimation performance.We conduct some extensive experiments on available datasets to show the significant improvements and supremacy of the proposed approach over the available state-of-the-art approaches.

[1]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[3]  José M. F. Moura,et al.  Traffic flow from a low frame rate city camera , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[4]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  José M. F. Moura,et al.  Understanding Traffic Density from Large-Scale Web Camera Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lu Zhang,et al.  Crowd Counting via Scale-Adaptive Convolutional Neural Network , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[8]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Hai Tao,et al.  Counting Pedestrians in Crowds Using Viewpoint Invariant Training , 2005, BMVC.

[10]  José M. F. Moura,et al.  FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[12]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Sergio A. Velastin,et al.  Crowd analysis: a survey , 2008, Machine Vision and Applications.

[15]  Lior Wolf,et al.  Learning to Count with CNN Boosting , 2016, ECCV.

[16]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[17]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[19]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Nikos Paragios,et al.  A MRF-based approach for real-time subway monitoring , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[24]  Yi Wang,et al.  Fast visual object counting via example-based density estimation , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[25]  Carlo S. Regazzoni,et al.  Distributed data fusion for real-time crowding estimation , 1996, Signal Process..

[26]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[27]  Haroon Idrees,et al.  Counting in Dense Crowds using Deep Features , 2015 .

[28]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[29]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  K. S. Venkatesh,et al.  People Counting in High Density Crowds from Still Images , 2015, ArXiv.

[32]  Tieniu Tan,et al.  Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.

[33]  Xiaogang Wang,et al.  Deeply learned attributes for crowded scene understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.