Towards Partial Supervision for Generic Object Counting in Natural Scenes

Generic object counting in natural scenes is a challenging computer vision problem. Existing approaches either rely on instance-level supervision or absolute count information to train a generic object counter. We introduce a partially supervised setting that significantly reduces the supervision level required for generic object counting. We propose two novel frameworks, named lower-count (LC) and reduced lower-count (RLC), to enable object counting under this setting. Our frameworks are built on a novel dual-branch architecture that has an image classification and a density branch. Our LC framework reduces the annotation cost due to multiple instances in an image by using only lower-count supervision for all object categories. Our RLC framework further reduces the annotation cost arising from large numbers of object categories in a dataset by only using lower-count supervision for a subset of categories and class-labels for the remaining ones. The RLC framework extends our dual-branch LC framework with a novel weight modulation layer and a category-independent density map prediction. Experiments are performed on COCO, Visual Genome and PASCAL 2007 datasets. Our frameworks perform on par with state-of-the-art approaches using higher levels of supervision. Additionally, we demonstrate the applicability of our LC supervised density map for image-level supervised instance segmentation.

[1]  Luc Van Gool,et al.  Boosting Object Proposals: From Pascal to COCO , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Andrew Zisserman,et al.  Class-Agnostic Counting , 2018, ACCV.

[3]  Silvia L. Pintea,et al.  Divide and Count: Generic Object Counting by Image Divisions , 2019, IEEE Transactions on Image Processing.

[4]  Pascal Fua,et al.  Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  G. Mandler,et al.  Subitizing: an analysis of its component processes. , 1982, Journal of experimental psychology. General.

[6]  D. Clements Subitizing: What Is It? Why Teach It?. , 1999 .

[7]  Trevor Darrell,et al.  Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Lior Wolf,et al.  A Dynamic Convolutional Layer for short rangeweather prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[10]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Fatih Murat Porikli,et al.  Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts , 2018, ACCV.

[13]  Hao Lu,et al.  From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Rama Chellappa,et al.  Zero-Shot Object Detection , 2018, ECCV.

[15]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Abe D. Hofman,et al.  The role of pattern recognition in children's exact enumeration of small numbers. , 2014, The British journal of developmental psychology.

[17]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  E. J. Capaldi,et al.  The Development of numerical competence : animal and human models , 1993 .

[20]  Baoyuan Wu,et al.  Residual Regression With Semantic Prior for Crowd Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Antoni B. Chan,et al.  Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Melih Kandemir,et al.  Gaussian Process Density Counting from Weak Supervision , 2016, ECCV.

[23]  Suha Kwak,et al.  Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Guiguang Ding,et al.  Shallow Feature Based Dense Attention Network for Crowd Counting , 2020, AAAI.

[25]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[28]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Xiang Bai,et al.  Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31]  Mark W. Schmidt,et al.  Where are the Blobs: Counting by Localization with Point Supervision , 2018, ECCV.

[32]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[33]  Li Pan,et al.  ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Mark W. Schmidt,et al.  Where are the Masks: Instance Segmentation with Image-level Supervision , 2019, BMVC.

[35]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Pei Lv,et al.  Attention Scaling for Crowd Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Guanbin Li,et al.  Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Fang Zhao,et al.  Dynamic Conditional Networks for Few-Shot Learning , 2018, ECCV.

[40]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Rynson W. H. Lau,et al.  Delving into Salient Object Subitizing and Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Qixiang Ye,et al.  Min-Entropy Latent Model for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Wei Lin,et al.  Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  F. Khan,et al.  Object Counting and Instance Segmentation With Image-Level Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yadong Mu,et al.  Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[47]  Tal Hassner,et al.  Precise Detection in Densely Packed Scenes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Fei Su,et al.  Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[49]  Luc Van Gool,et al.  Convolutional Oriented Boundaries , 2016, ECCV.

[50]  Guangshuai Gao,et al.  CNN-based Density Estimation and Crowd Counting: A Survey , 2020, ArXiv.

[51]  David Doermann,et al.  Learning Instance Activation Maps for Weakly Supervised Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Deyu Meng,et al.  DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Yi Zhu,et al.  Soft Proposal Networks for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Qiang Qiu,et al.  Weakly Supervised Instance Segmentation Using Class Peak Response , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Arie Shoshani,et al.  Optimizing connected component labeling algorithms , 2005, SPIE Medical Imaging.

[56]  Neil D. B. Bruce,et al.  Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Joost van de Weijer,et al.  Leveraging Unlabeled Data for Crowd Counting by Learning to Rank , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[60]  Margrit Betke,et al.  Salient Object Subitizing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Antoni B. Chan,et al.  Incorporating Side Information by Adaptive Convolution , 2017, International Journal of Computer Vision.

[62]  Ling Shao,et al.  Relational Attention Network for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jitendra Malik,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[65]  Jia Xu,et al.  Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[67]  Bernt Schiele,et al.  Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[69]  Ramprasaath R. Selvaraju,et al.  Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Larry S. Davis,et al.  C-WSL: Count-guided Weakly Supervised Localization , 2017, ECCV.

[71]  Alexander Hauptmann,et al.  Learning Spatial Awareness to Improve Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Saturnino Maldonado-Bascón,et al.  Extremely Overlapping Vehicle Counting , 2015, IbPRIA.

[73]  Qijun Chen,et al.  Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).