Comparative study on crowd counting with deep learning

This paper aims at comparing 4 top models for crowd counting and evaluating their highlights based on their performance. In DSNet, the distended convolution block network was proposed, where the distended layers are densely connected to each other in order to preserve information from continuously varied scales. Three blocks are cascaded and linked to dense residual connections to widen the range of levels covered by network and also a novel loss of consistency at multi-scale density level was introduced to improve performance. In SFANet, two foremost elements with VGG backbone CNN and two-way path multi-scale fusion networks were suggested for the front end feature extractor and back end to make density map in which one path highlights crowded regions present in images. The other direction is responsible for the fusion of multi-scale features and for the generation of the final high-quality high-density maps. In MANet (Multi-scale Attention Network), a new mechanism of soft attention was presented, which learns a series of masks and a level-conscious loss feature was introduced to regularize and direct the learning of different branches to specialize on a specific scale. In Bayesian Loss, a novel loss function was used to generate a density contribution model from the point annotations. We also analyzed the results of the 4 convolutional neural networks, extracted the pattern of convolutional neural network structure and found promising pathways for researchers in this fast-growing area.

[1]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Antoni B. Chan,et al.  Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid , 2018, BMVC.

[3]  Yongdong Zhang,et al.  Dense Scale Network for Crowd Counting , 2019, ICMR.

[4]  Greg Mori,et al.  Detecting Pedestrians by Learning Shapelet Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Li Hou,et al.  Deep Spatial Regression Model for Image Crowd Counting , 2017, ArXiv.

[6]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[7]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  François Brémond,et al.  Crowd Behavior Recognition for Video Surveillance , 2008, ACIVS.

[9]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[14]  Teddy Ko,et al.  A survey on behavior analysis in video surveillance for homeland security applications , 2008, 2008 37th IEEE Applied Imagery Pattern Recognition Workshop.

[15]  Vishal M. Patel,et al.  CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[16]  Sridha Sridharan,et al.  Crowd Counting Using Multiple Local Features , 2009, 2009 Digital Image Computing: Techniques and Applications.

[17]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[18]  Yang Wang,et al.  Crowd Counting Using Scale-Aware Attention Networks , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[22]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[23]  Chao Lu,et al.  Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting , 2019, ArXiv.

[24]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Lu Zhang,et al.  Crowd Counting via Scale-Adaptive Convolutional Neural Network , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Davide Modolo,et al.  Multi-Scale Attention Network for Crowd Counting , 2019 .

[31]  Bingbing Ni,et al.  Crowd Counting via Adversarial Cross-Scale Consistency Pursuit , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[36]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).