论文信息 - Towards Partial Supervision for Generic Object Counting in Natural Scenes

Towards Partial Supervision for Generic Object Counting in Natural Scenes

Generic object counting in natural scenes is a challenging computer vision problem. Existing approaches either rely on instance-level supervision or absolute count information to train a generic object counter. We introduce a partially supervised setting that significantly reduces the supervision level required for generic object counting. We propose two novel frameworks, named lower-count (LC) and reduced lower-count (RLC), to enable object counting under this setting. Our frameworks are built on a novel dual-branch architecture that has an image classification and a density branch. Our LC framework reduces the annotation cost due to multiple instances in an image by using only lower-count supervision for all object categories. Our RLC framework further reduces the annotation cost arising from large numbers of object categories in a dataset by only using lower-count supervision for a subset of categories and class-labels for the remaining ones. The RLC framework extends our dual-branch LC framework with a novel weight modulation layer and a category-independent density map prediction. Experiments are performed on COCO, Visual Genome and PASCAL 2007 datasets. Our frameworks perform on par with state-of-the-art approaches using higher levels of supervision. Additionally, we demonstrate the applicability of our LC supervised density map for image-level supervised instance segmentation.

[1] Luc Van Gool,et al. Boosting Object Proposals: From Pascal to COCO , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Andrew Zisserman,et al. Class-Agnostic Counting , 2018, ACCV.

[3] Silvia L. Pintea,et al. Divide and Count: Generic Object Counting by Image Divisions , 2019, IEEE Transactions on Image Processing.

[4] Pascal Fua,et al. Context-Aware Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] G. Mandler,et al. Subitizing: an analysis of its component processes. , 1982, Journal of experimental psychology. General.

[6] D. Clements. Subitizing: What Is It? Why Teach It?. , 1999 .

[7] Trevor Darrell,et al. Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Lior Wolf,et al. A Dynamic Convolutional Layer for short rangeweather prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[10] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[11] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[12] Fatih Murat Porikli,et al. Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts , 2018, ACCV.

[13] Hao Lu,et al. From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Rama Chellappa,et al. Zero-Shot Object Detection , 2018, ECCV.

[15] Pushmeet Kohli,et al. On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Abe D. Hofman,et al. The role of pattern recognition in children's exact enumeration of small numbers. , 2014, The British journal of developmental psychology.

[17] Yuhong Li,et al. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] E. J. Capaldi,et al. The Development of numerical competence : animal and human models , 1993 .

[20] Baoyuan Wu,et al. Residual Regression With Semantic Prior for Crowd Counting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Antoni B. Chan,et al. Adaptive Density Map Generation for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Melih Kandemir,et al. Gaussian Process Density Counting from Weak Supervision , 2016, ECCV.

[23] Suha Kwak,et al. Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Guiguang Ding,et al. Shallow Feature Based Dense Attention Network for Crowd Counting , 2020, AAAI.

[25] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Subhransu Maji,et al. Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[28] Yihong Gong,et al. Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Xiang Bai,et al. Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31] Mark W. Schmidt,et al. Where are the Blobs: Counting by Localization with Point Supervision , 2018, ECCV.

[32] Trevor Darrell,et al. LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[33] Li Pan,et al. ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Mark W. Schmidt,et al. Where are the Masks: Instance Segmentation with Image-level Supervision , 2019, BMVC.

[35] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Pei Lv,et al. Attention Scaling for Crowd Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Guanbin Li,et al. Crowd Counting With Deep Structured Scale Integration Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Fang Zhao,et al. Dynamic Conditional Networks for Few-Shot Learning , 2018, ECCV.

[40] Charless C. Fowlkes,et al. Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41] Rynson W. H. Lau,et al. Delving into Salient Object Subitizing and Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] Qixiang Ye,et al. Min-Entropy Latent Model for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Wei Lin,et al. Learning From Synthetic Data for Crowd Counting in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] F. Khan,et al. Object Counting and Instance Segmentation With Image-Level Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Yadong Mu,et al. Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.

[47] Tal Hassner,et al. Precise Detection in Densely Packed Scenes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Fei Su,et al. Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[49] Luc Van Gool,et al. Convolutional Oriented Boundaries , 2016, ECCV.

[50] Guangshuai Gao,et al. CNN-based Density Estimation and Crowd Counting: A Survey , 2020, ArXiv.

[51] David Doermann,et al. Learning Instance Activation Maps for Weakly Supervised Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Deyu Meng,et al. DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53] Yi Zhu,et al. Soft Proposal Networks for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54] Qiang Qiu,et al. Weakly Supervised Instance Segmentation Using Class Peak Response , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55] Arie Shoshani,et al. Optimizing connected component labeling algorithms , 2005, SPIE Medical Imaging.

[56] Neil D. B. Bruce,et al. Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57] Joost van de Weijer,et al. Leveraging Unlabeled Data for Crowd Counting by Learning to Rank , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58] Pushmeet Kohli,et al. On Detection of Multiple Object Instances Using Hough Transforms , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[59] Andrew Zisserman,et al. Learning To Count Objects in Images , 2010, NIPS.

[60] Margrit Betke,et al. Salient Object Subitizing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Antoni B. Chan,et al. Incorporating Side Information by Adaptive Convolution , 2017, International Journal of Computer Vision.

[62] Ling Shao,et al. Relational Attention Network for Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Jitendra Malik,et al. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[65] Jia Xu,et al. Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[67] Bernt Schiele,et al. Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[69] Ramprasaath R. Selvaraju,et al. Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Larry S. Davis,et al. C-WSL: Count-guided Weakly Supervised Localization , 2017, ECCV.

[71] Alexander Hauptmann,et al. Learning Spatial Awareness to Improve Crowd Counting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72] Saturnino Maldonado-Bascón,et al. Extremely Overlapping Vehicle Counting , 2015, IbPRIA.

[73] Qijun Chen,et al. Revisiting Perspective Information for Efficient Crowd Counting , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).