People Counting in Dense Crowd Images Using Sparse Head Detections

People counting in extremely dense crowds is a challenging problem due to severe occlusions, few pixels per head, cluttered environments, and skewed camera perspectives. In this paper, we present a novel algorithm for people counting in highly dense crowd images. Our approach relies on the fact that the head is the most visible part of an individual in a dense crowd. As such, a head detector can be used to estimate the spatially varying head size, which is the key feature used in our head counting procedure. We leverage the state-of-the art convolutional neural network for the sparse head detection in a dense crowd. After sub-dividing the image into rectangular patches, we first use an speeded-up robust features-based support vector machine binary classifier to label each patch as crowd/not-crowd and eliminate all not-crowd patches. Regression is then performed on each crowd patch to estimate average head size. The number of individuals in each patch is estimated by dividing the patch area with the estimated head size. For the crowd patches where no heads are detected, the counts are estimated based on distance-based weighted averaging over the counts from neighboring patches. Finally, the individual patch counts are summed up to obtain the total count. We evaluate our approach on three publicly available datasets for extremely dense crowds: UCF_CC_50, ShanghaiTech, and AHU-Crowd. Our approach gives comparable results on these challenging datasets to other state of the art algorithms but, unlike other algorithms, our proposed method does not require the laborious task of obtaining labeled training data of dense crowd images.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[3]  Yan Wang,et al.  Dense crowd counting from still images with convolutional neural networks , 2016, J. Vis. Commun. Image Represent..

[4]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  A. N. Marana,et al.  Real-Time Crowd Density Estimation Using Images , 2005, ISVC.

[6]  Hai Tao,et al.  Counting Pedestrians in Crowds Using Viewpoint Invariant Training , 2005, BMVC.

[7]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Tieniu Tan,et al.  Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.

[10]  Haizhou Ai,et al.  End-to-end crowd counting via joint learning local and global count , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[11]  Sridha Sridharan,et al.  Crowd Counting Using Multiple Local Features , 2009, 2009 Digital Image Computing: Techniques and Applications.

[12]  Srinivas S. Kruthiventi,et al.  CrowdNet: A Deep Convolutional Network for Dense Crowd Counting , 2016, ACM Multimedia.

[13]  Carlo S. Regazzoni,et al.  Online pedestrian group walking event detection using spectral analysis of motion similarity graph , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[14]  Sergio A. Velastin,et al.  Crowd monitoring using image processing , 1995 .

[15]  Carlo S. Regazzoni,et al.  People Count Estimation In Small Crowds , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[16]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ivan Laptev,et al.  Context-Aware CNNs for Person Head Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Xiaochun Cao,et al.  Deep People Counting in Extremely Dense Crowds , 2015, ACM Multimedia.

[21]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[22]  Ivan Laptev,et al.  Density-aware person detection and tracking in crowds , 2011, ICCV.

[23]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[25]  Tommy W. S. Chow,et al.  A neural-based crowd estimation by hybrid global learning algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[29]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Haroon Idrees,et al.  Counting in Dense Crowds using Deep Features , 2015 .

[31]  Lei Huang,et al.  Crowd density analysis using co-occurrence texture features , 2010, 5th International Conference on Computer Sciences and Convergence Information Technology.

[32]  Robert T. Collins,et al.  Marked point processes for crowd counting , 2009, CVPR.

[33]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  K. S. Venkatesh,et al.  People Counting in High Density Crowds from Still Images , 2015, ArXiv.

[35]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[37]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[39]  Noel E. O'Connor,et al.  Fully Convolutional Crowd Counting on Highly Congested Scenes , 2016, VISIGRAPP.

[40]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[41]  Roberto Cipolla,et al.  Unsupervised Bayesian Detection of Independent Motion in Crowds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Nobhojit Roy,et al.  Comparing Two Epidemiologic Surveillance Methods to Assess Underestimation of Human Stampedes in India , 2013, PLoS currents.

[44]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[45]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.