Second-order convolutional network for crowd counting

Single image crowd counting remains challenging primarily due to various issues, such as large scale variations, perspective and non-uniform crowd distribution. In this paper, we propose a novel architecture referred to Second-Order Convolutional Network (SOCN) to deal with this task from the perspective of improving the feature transformation capability of the network. The proposed SOCN applies a convolutional neural network as the backbone. We introduce three cascaded second-order blocks located behind the backbone to augment the family of transformation operations and increase the nonlinearity of the network, which can extract multi-scale and discriminative features. Furthermore, we design a context attention module (CAM) including dilated convolutions to assign weights to the score map of each second-order block for the purpose that the features which contribute to counting can be highlighted. We conduct various experiments on ShanghaiTeach1 and UCF_CC_502 datasets, and the results demonstrate the effectiveness of our method.

[1]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  R. Venkatesh Babu,et al.  Top-Down Feedback for Crowd Counting Convolutional Neural Network , 2018, AAAI.

[3]  Yan Wang,et al.  SORT: Second-Order Response Transform for Visual Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Lu Zhang,et al.  Crowd Counting via Scale-Adaptive Convolutional Neural Network , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Bingbing Ni,et al.  Crowd Counting via Adversarial Cross-Scale Consistency Pursuit , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Sheng-Fuu Lin,et al.  Estimation of number of people in crowded scenes using perspective transformation , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[12]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[18]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).