论文信息 - From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

Visual counting, a task that predicts the number of objects from an image/video, is an open-set problem by nature, i.e., the number of population can vary in [0,+∞) in theory. However, the collected images and labeled count values are limited in reality, which means only a small closed set is observed. Existing methods typically model this task in a regression manner, while they are likely to suffer from an unseen scene with counts out of the scope of the closed set. In fact, counting is decomposable. A dense region can always be divided until the count values of sub-regions are within the previously observed closed set. Inspired by this idea, we propose a simple but effective approach, Spatial Divide-and-Conquer Network (S-DCNet). S-DCNet learns to classify closed-set counts and can generalize to open-set counts via S-DC. S-DCNet is also efﬁcient. To avoid repeatedly computing sub-region convolutional features, S-DC is executed on the feature map instead of on the input image. S-DCNet achieves the state-of-the-art performance on three crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF), a vehicle counting dataset (TRANCOS) and a plant counting dataset (MTC). Compared to the previous best methods, S-DCNet brings a 20.2% relative improvement on the ShanghaiTechPart B, 20.9% on the UCF-QNRF, 22.5% on the TRANCOS and 15.1% on the MTC. Code has been made available at: https://github.com/xhp-hust-2018-2011/S-DCNet.

[1] Saturnino Maldonado-Bascón,et al. Extremely Overlapping Vehicle Counting , 2015, IbPRIA.

[2] R. Venkatesh Babu,et al. Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Bingbing Ni,et al. Crowd Counting via Adversarial Cross-Scale Consistency Pursuit , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Vishal M. Patel,et al. Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] Deyu Meng,et al. DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Changxin Gao,et al. Scale Pyramid Network for Crowd Counting , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7] Guoyan Zheng,et al. Crowd Counting with Deep Negative Correlation Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Haroon Idrees,et al. Counting in Dense Crowds using Deep Features , 2015 .

[9] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[11] Andrew Zisserman,et al. Learning To Count Objects in Images , 2010, NIPS.

[12] Daniel Oñoro-Rubio,et al. Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[13] Yuhong Li,et al. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Shenghua Gao,et al. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Shiv Surya,et al. Switching Convolutional Neural Network for Crowd Counting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Silvia L. Pintea,et al. Divide and Count: Generic Object Counting by Image Divisions , 2019, IEEE Transactions on Image Processing.

[17] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Hieu Le,et al. Iterative Crowd Counting , 2018, ECCV.

[19] J. Araus,et al. Wheat ear counting in-field conditions: high throughput and low-cost approach using RGB images , 2018, Plant Methods.

[20] S. Tsaftaris,et al. Learning to Count Leaves in Rosette Plants , 2015 .

[21] Haroon Idrees,et al. Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[22] Gang Hua,et al. Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24] Vishal M. Patel,et al. CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[25] Fei Su,et al. Scale Aggregation Network for Accurate and Efficient Crowd Counting , 2018, ECCV.

[26] Xiaogang Wang,et al. Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Zhiguo Cao,et al. Deep attention-based classification network for robust depth prediction , 2018, ACCV.

[28] Liang Lin,et al. Crowd Counting using Deep Recurrent Spatial-Aware Network , 2018, IJCAI.

[29] Zhiguo Cao,et al. TasselNet: counting maize tassels in the wild via local counts regression network , 2017, Plant Methods.

[30] Mark W. Schmidt,et al. Where are the Blobs: Counting by Localization with Point Supervision , 2018, ECCV.

[31] Shaogang Gong,et al. Cumulative Attribute Space for Age and Crowd Density Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Haroon Idrees,et al. Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Ramprasaath R. Selvaraju,et al. Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Shaogang Gong,et al. Feature Mining for Localised Crowd Counting , 2012, BMVC.

[37] Yoshua Bengio,et al. Count-ception: Counting by Fully Convolutional Redundant Counting , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[38] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.