Crowd counting via learning perspective for multi-scale multi-view Web images

Estimating the number of people in Web images still remains a challenging problem owing to the perspective variation, different views, and diverse backgrounds. Existing deep learning models still have difficulties in dealing with scenarios where the size of a person is either extremely large or extremely small. In this paper, we propose a novel perspective-aware architecture to estimate the number of people in a crowd in web images. Specifically, we use a two-stage framework, where we first learn a policy network to infer the perspective of the target scene, which outputs a scale label for the subsequent perspective normalization. Next, given the aligned inputs, we further adjust the scale-specific counting network to regress the final count. Experiments on challenging datasets demonstrate our approach can deal with a large perspective variation and that we have achieved state-of-theart results.

[1]  Hai Tao,et al.  A Viewpoint Invariant Approach for Crowd Counting , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[2]  Hong-Yuan Mark Liao,et al.  Cross-Camera Knowledge Transfer for Multiview People Counting , 2015, IEEE Transactions on Image Processing.

[3]  Haizhou Ai,et al.  End-to-end crowd counting via joint learning local and global count , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ullrich Köthe,et al.  Learning to count with regression forest and structured labels , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Ivan Laptev,et al.  Data-driven crowd analysis in videos , 2011, ICCV.

[9]  Shaogang Gong,et al.  Cumulative Attribute Space for Age and Crowd Density Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Shaogang Gong,et al.  From Semi-supervised to Transfer Counting of Crowds , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[13]  Xiaogang Wang,et al.  Deeply learned attributes for crowded scene understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Haroon Idrees,et al.  Detecting Humans in Dense Crowds Using Locally-Consistent Scale Prior and Global Occlusion Reasoning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Svetha Venkatesh,et al.  Face Recognition Using Kernel Ridge Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yandong Tang,et al.  Flow mosaicking: Real-time pedestrian counting without scene-specific learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Antoni B. Chan,et al.  Crossing the Line: Crowd Counting by Integer Programming with Local Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ryuzo Okada,et al.  COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23]  Xin Geng,et al.  Crowd counting in public video surveillance by label distribution learning , 2015, Neurocomputing.

[24]  Nuno Vasconcelos,et al.  Bayesian Model Adaptation for Crowd Counts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Yangsheng Xu,et al.  Crowd Density Estimation Using Texture Analysis and Learning , 2006, 2006 IEEE International Conference on Robotics and Biomimetics.

[27]  Mubarak Shah,et al.  A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[29]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Andrew Zisserman,et al.  Interactive Object Counting , 2014, ECCV.

[31]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.