论文信息 - RSANet: Deep Recurrent Scale-Aware Network for Crowd Counting

RSANet: Deep Recurrent Scale-Aware Network for Crowd Counting

Most recent works have made significant progress in crowd counting by fusing multi-scale features directly with weighted sum or concatenation to handle large scale variation problems. Meanwhile, there is very little attention paid on the prediction of high-resolution density maps and predicted low-resolution density maps lead to inaccurate counting results. In this paper, we present a novel recurrent scale-aware network(RSANet) to generate a high-resolution density map with scale-aware feature fusion approach. Within this network, we introduce a coarse-to-fine scheme restoring the high-resolution feature map from a low-resolution feature map progressively with stacked dilated convolution blocks. Then, we incorporate recurrent modules to capture dynamic scale-aware information and to benefit the restoration of high-resolution feature maps through multi-scale feature fusion to generate a high-resolution density map. We also use a multi-resolution supervision strategy for training to improve the performance of our network. Extensive experiments on three challenging crowd counting datasets demonstrate the effectiveness of the proposed method.

[1] Yongdong Zhang,et al. Dense Scale Network for Crowd Counting , 2019, ICMR.

[2] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Qijun Chen,et al. Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Ling Shao,et al. Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ling Shao,et al. Attentional Neural Fields for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6] Shenghua Gao,et al. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[8] Vishal M. Patel,et al. HA-CCN: Hierarchical Attention-Based Crowd Counting Network , 2019, IEEE Transactions on Image Processing.

[9] Pascal Fua,et al. Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yuhong Li,et al. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Vishal M. Patel,et al. Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Guanbin Li,et al. Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Fei Su,et al. Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[14] Garrison W. Cottrell,et al. Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15] Hao Lu,et al. From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Yi Wang,et al. Scale-Recurrent Network for Deep Image Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Xiao-Liang Xie,et al. Attention-Guided Lightweight Network for Real-Time Segmentation of Robotic Surgical Instruments , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[18] Ling Shao,et al. Motion-Attentive Transition for Zero-Shot Video Object Segmentation , 2020, AAAI.

[19] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[20] Wei Lin,et al. Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Hongbin Zha,et al. Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining , 2018, ECCV.

[22] Liang Lin,et al. Crowd Counting using Deep Recurrent Spatial-Aware Network , 2018, IJCAI.

[23] Chongyang Zhang,et al. Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).