Density-aware and background-aware network for crowd counting via multi-task learning

Abstract In this paper, we propose a density-aware and background-aware network via multi-task learning (MTL-DB) for crowd counting. It aims to enable the model to capture the high-level semantic information of density and background via multi-task joint training, which may jointly optimize the generation of density maps. Initially, MTL-DB utilizes the first ten layers of VGG-16 with Batch Normalization as the front-end to extract primary features which will be shared by all tasks. Then, a multi-task back-end is constructed by integrating the main task of density map estimation with two auxiliary tasks, i.e., density classification and background segmentation. The density classification auxiliary task captures the density-related information with a fully connected classifier, while the background segmentation auxiliary task applies dilated convolutional network to distinguish the head area of pedestrians and background. With high-level semantic awareness, the main task generates estimated density maps utilizing normal convolutional layers. Furthermore, a multi-task joint loss is proposed to improve the quality of estimated density maps. Extensive experiments on three challenging crowd datasets (ShanghaiTech Part A & B, UCF_CC_50, and UCF_QNRF) verified the effectiveness of this multi-task learning model. MTL-DB outperformed other multi-task learning methods on the ShanghaiTech dataset, both Part A and Part B.

[1]  Vishal M. Patel,et al.  A Survey of Recent Advances in CNN-based Single Image Crowd Counting and Density Estimation , 2017, Pattern Recognit. Lett..

[2]  Nicu Sebe,et al.  Reverse Perspective Network for Perspective-Aware Object Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Pei Lv,et al.  Density-Aware Multi-Task Learning for Crowd Counting , 2020, IEEE Transactions on Multimedia.

[4]  Hao Cai,et al.  Stochastic Multi-Scale Aggregation Network for Crowd Counting , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Ming Zhu,et al.  Attentive multi-stage convolutional neural network for crowd counting , 2020, Pattern Recognit. Lett..

[6]  Junping Zhang,et al.  PaDNet: Pan-Density Crowd Counting , 2018, IEEE Transactions on Image Processing.