Divide and Count: Generic Object Counting by Image Divisions

We propose a general object counting method that does not use any prior category information. We learn from local image divisions to predict global image-level counts without using any form of local annotations. Our method separates the input image into a set of image divisions—each fully covering the image. Each image division is composed of a set of region proposals or uniform grid cells. Our approach learns in an end-to-end deep learning architecture to predict global image-level counts from local image divisions. The method incorporates a counting layer which predicts object counts in the complete image, by enforcing consistency in counts when dealing with overlapping image regions. Our counting layer is based on the inclusion–exclusion principle from set theory. We analyze the individual building blocks of our proposed approach on Pascal-VOC2007 and evaluate our method on the MS-COCO large scale generic object data set as well as on three class-specific counting data sets: UCSD pedestrian data set, and CARPK, and PUCPR+ car data sets.

[1]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[3]  Tieniu Tan,et al.  Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection , 2008, 2008 19th International Conference on Pattern Recognition.

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Pascal Mettes,et al.  Nature Conservation Drones for Automatic Localization and Counting of Animals , 2014, ECCV Workshops.

[6]  Hai Tao,et al.  A Viewpoint Invariant Approach for Crowd Counting , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Nilanjan Ray,et al.  Cell Counting by Regression Using Convolutional Neural Network , 2016, ECCV Workshops.

[9]  Mathieu Salzmann,et al.  Deep Convolutional Neural Networks for Human Embryonic Cell Counting , 2016, ECCV Workshops.

[10]  Ryuzo Okada,et al.  COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[12]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Nuno Vasconcelos,et al.  Counting People With Low-Level Features and Bayesian Regression , 2012, IEEE Transactions on Image Processing.

[14]  Ullrich Köthe,et al.  Learning to count with regression forest and structured labels , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrew Zisserman,et al.  Counting in the Wild , 2016, ECCV.

[17]  Saturnino Maldonado-Bascón,et al.  Extremely Overlapping Vehicle Counting , 2015, IbPRIA.

[18]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alan Fern,et al.  Person count localization in videos from noisy foreground and detections , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Luiz Eduardo Soares de Oliveira,et al.  PKLot - A robust dataset for parking lot classification , 2015, Expert Syst. Appl..

[21]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[22]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xu Liu,et al.  Highway Vehicle Counting in Compressed Domain , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Lucia Maddalena,et al.  People counting by learning their appearance in a multi-view camera environment , 2014, Pattern Recognit. Lett..

[26]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[27]  Sridha Sridharan,et al.  Scene invariant multi camera crowd counting , 2014, Pattern Recognit. Lett..

[28]  Wesam A. Sakla,et al.  A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning , 2016, ECCV.

[29]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32]  Richard S. Zemel,et al.  End-to-End Instance Segmentation and Counting with Recurrent Attention , 2016, ArXiv.

[33]  H. Poincaré,et al.  The Inclusion–Exclusion Principle , 2021, An Invitation to Combinatorics.

[34]  Zheng Zhang,et al.  Learning to Point and Count , 2015, ArXiv.

[35]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Melih Kandemir,et al.  Gaussian Process Density Counting from Weak Supervision , 2016, ECCV.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Greg Mori,et al.  Machine Vision and Applications Manuscript No. Bearcam: Automated Wildlife Monitoring at the Arctic Circle , 2022 .

[43]  Richard S. Zemel,et al.  End-to-End Instance Segmentation with Recurrent Attention , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jake K. Aggarwal,et al.  Counting Vehicles in Highway Surveillance Videos , 2010, 2010 20th International Conference on Pattern Recognition.

[45]  Ramprasaath R. Selvaraju,et al.  Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[47]  Andrew Zisserman,et al.  Microscopy cell counting and detection with fully convolutional regression networks , 2018, Comput. methods Biomech. Biomed. Eng. Imaging Vis..

[48]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[50]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[51]  Lior Wolf,et al.  Learning to Count with CNN Boosting , 2016, ECCV.

[52]  Jordi Vitrià,et al.  Learning to count with deep object features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[53]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.