RSANet: Deep Recurrent Scale-Aware Network for Crowd Counting

Most recent works have made significant progress in crowd counting by fusing multi-scale features directly with weighted sum or concatenation to handle large scale variation problems. Meanwhile, there is very little attention paid on the prediction of high-resolution density maps and predicted low-resolution density maps lead to inaccurate counting results. In this paper, we present a novel recurrent scale-aware network(RSANet) to generate a high-resolution density map with scale-aware feature fusion approach. Within this network, we introduce a coarse-to-fine scheme restoring the high-resolution feature map from a low-resolution feature map progressively with stacked dilated convolution blocks. Then, we incorporate recurrent modules to capture dynamic scale-aware information and to benefit the restoration of high-resolution feature maps through multi-scale feature fusion to generate a high-resolution density map. We also use a multi-resolution supervision strategy for training to improve the performance of our network. Extensive experiments on three challenging crowd counting datasets demonstrate the effectiveness of the proposed method.

[1]  Yongdong Zhang,et al.  Dense Scale Network for Crowd Counting , 2019, ICMR.

[2]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Qijun Chen,et al.  Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ling Shao,et al.  Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ling Shao,et al.  Attentional Neural Fields for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[8]  Vishal M. Patel,et al.  HA-CCN: Hierarchical Attention-Based Crowd Counting Network , 2019, IEEE Transactions on Image Processing.

[9]  Pascal Fua,et al.  Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Vishal M. Patel,et al.  Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Guanbin Li,et al.  Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Fei Su,et al.  Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[14]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Hao Lu,et al.  From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Yi Wang,et al.  Scale-Recurrent Network for Deep Image Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Xiao-Liang Xie,et al.  Attention-Guided Lightweight Network for Real-Time Segmentation of Robotic Surgical Instruments , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Ling Shao,et al.  Motion-Attentive Transition for Zero-Shot Video Object Segmentation , 2020, AAAI.

[19]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[20]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hongbin Zha,et al.  Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining , 2018, ECCV.

[22]  Liang Lin,et al.  Crowd Counting using Deep Recurrent Spatial-Aware Network , 2018, IJCAI.

[23]  Chongyang Zhang,et al.  Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).