Adaptive Context Learning Network for Crowd Counting

The task of crowd counting is to estimate the accurate number of people in photos taken from unconstrained surveillance scenes. It is in general a challenging problem due to the input scale variations and perspective distortions. Previous methods make efforts to enhance the representation ability by using multi-scale features of the scene pictures. However, most of these methods directly add or fuse the features, in which the influences of different feature sizes are equally considered. In this paper, we propose a novel architecture called adaptive context learning network (ACLNet) to incorporate context of features in multiple levels. In this architecture, the original image features are enhanced by a multi-level feature generating module, and then the multi-level features are up-sampled to the same size and re-weighted for fusing. The ACLNet incorporates the context information existed in sub-regions of various scales adaptively, thus it is able to enhance the representative ability of multi-level features. We perform several experiments on public ShanghaiTech (A and B), UCF_CC_50 and NWPU-crowd datasets. Our proposed ACLNet achieves the state-of-the-art results compared with existing methods.

[1]  Ramakant Nevatia,et al.  Segmentation and Tracking of Multiple Humans in Crowded Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Qi Wang,et al.  SCAR: Spatial-/Channel-wise Attention Regression Networks for Crowd Counting , 2019, Neurocomputing.

[3]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Qi Wang,et al.  NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Meng Wang,et al.  Automatic adaptation of a generic pedestrian detector to a specific traffic scene , 2011, CVPR 2011.

[6]  Ryuzo Okada,et al.  COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[8]  Roberto Cipolla,et al.  Unsupervised Bayesian Detection of Independent Motion in Crowds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Mark W. Schmidt,et al.  Where are the Blobs: Counting by Localization with Point Supervision , 2018, ECCV.

[10]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[12]  Andrew Zisserman,et al.  Counting in the Wild , 2016, ECCV.

[13]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alexander Hauptmann,et al.  Learning Spatial Awareness to Improve Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[16]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Haroon Idrees,et al.  Detecting Humans in Dense Crowds Using Locally-Consistent Scale Prior and Global Occlusion Reasoning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Wangmeng Zuo,et al.  Perspective-Guided Convolution Networks for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Pascal Fua,et al.  Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Hieu Le,et al.  Iterative Crowd Counting , 2018, ECCV.

[23]  Fei Su,et al.  Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[24]  Lior Wolf,et al.  Learning to Count with CNN Boosting , 2016, ECCV.

[25]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Noel E. O'Connor,et al.  People, Penguins and Petri Dishes: Adapting Object Counting Models to New Visual Domains and Object Types Without Forgetting , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.