Fine-Grained Crowd Counting

Current crowd counting algorithms are only concerned about the number of people in an image, which lacks low-level fine-grained information of the crowd. For many practical applications, the total number of people in an image is not as useful as the number of people in each sub-category. For example, knowing the number of people waiting inline or browsing can help retail stores; knowing the number of people standing/sitting can help restaurants/cafeterias; knowing the number of violent/non-violent people can help police in crowd management. In this paper, we propose fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals (e.g. standing/sitting or violent behavior) and then counts the number of people in each category. To enable research in this area, we construct a new dataset of four real-world fine-grained counting tasks: traveling direction on a sidewalk, standing or sitting, waiting in line or not, and exhibiting violent behavior or not. Since the appearance features of different crowd categories are similar, the challenge of fine-grained crowd counting is to effectively utilize contextual information to distinguish between categories. We propose a two branch architecture, consisting of a density map estimation branch and a semantic segmentation branch. We propose two refinement strategies for improving the predictions of the two branches. First, to encode contextual information, we propose feature propagation guided by the density map prediction, which eliminates the effect of background features during propagation. Second, we propose a complementary attention model to share information between the two branches. Experiment results confirm the effectiveness of our method.

[1]  Xu Jiang,et al.  Robust Bi-Stochastic Graph Regularized Matrix Factorization for Data Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Chao Lu,et al.  Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting , 2019, ArXiv.

[4]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Joost van de Weijer,et al.  Leveraging Unlabeled Data for Crowd Counting by Learning to Rank , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[9]  Antoni B. Chan,et al.  Incorporating Side Information by Adaptive Convolution , 2017, International Journal of Computer Vision.

[10]  Xuelong Li,et al.  NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[12]  R. Venkatesh Babu,et al.  Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  R. Venkatesh Babu,et al.  Top-Down Feedback for Crowd Counting Convolutional Neural Network , 2018, AAAI.

[14]  Xiaogang Wang,et al.  Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset , 2016, IEEE Transactions on Multimedia.

[15]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Tieniu Tan,et al.  Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.

[18]  Antoni B. Chan,et al.  Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid , 2018, BMVC.

[19]  Antoni B. Chan,et al.  Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Counting, Detection, and Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Shiv Surya,et al.  Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yandong Tang,et al.  Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Xiaogang Wang,et al.  Learning Scene-Independent Group Descriptors for Crowd Understanding , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[25]  Liang Lin,et al.  Crowd Counting using Deep Recurrent Spatial-Aware Network , 2018, IJCAI.

[26]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bertrand Luvison,et al.  Crowd-11: A Dataset for Fine Grained Crowd Behaviour Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Tal Hassner,et al.  Violent flows: Real-time detection of violent crowd behavior , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Dit-Yan Yeung,et al.  Spatiotemporal Modeling for Crowd Counting in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Antoni B. Chan,et al.  Small instance detection by integer programming on object density maps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Osamu Hasegawa,et al.  Random Field Model for Integration of Local Information and Global Information , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Dongyoon Han,et al.  EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse , 2019, ArXiv.

[36]  Antoni B. Chan,et al.  Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Vishal M. Patel,et al.  Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Philip H. S. Torr,et al.  Dual Graph Convolutional Network for Semantic Segmentation , 2019, BMVC.

[39]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[40]  Takayuki Kanda,et al.  Do walking pedestrians stabily interact inside a large group? Analysis of group and sub-group spatial structure , 2013, CogSci.

[41]  Fei Su,et al.  Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[42]  Yuan Yuan,et al.  Pixel-Wise Crowd Understanding via Synthetic Data , 2020, International Journal of Computer Vision.

[43]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Vishal M. Patel,et al.  A Survey of Recent Advances in CNN-based Single Image Crowd Counting and Density Estimation , 2017, Pattern Recognit. Lett..

[45]  W. Marsden I and J , 2012 .

[46]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Hieu Le,et al.  Iterative Crowd Counting , 2018, ECCV.

[48]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).