Looc: Localize Overlapping Objects with Count Supervision

Acquiring count annotations generally requires less human effort than point-level and bounding box annotations. Thus, we propose the novel problem setup of localizing objects in dense scenes under this weaker supervision. We propose LOOC, a method to Localize Overlapping Objects with Count supervision. We train LOOC by alternating between two stages. In the first stage, LOOC learns to generate pseudo point-level annotations in a semi-supervised manner. In the second stage, LOOC uses a fully-supervised localization method that trains on these pseudo labels. The localization method is used to progressively improve the quality of the pseudo labels. We conducted experiments on popular counting datasets. For localization, LOOC achieves a strong new baseline in the novel problem setup where only count supervision is available. For counting, LOOC outperforms current state-of-the-art methods that only use count as their supervision.

[1]  Chang Liu,et al.  C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew Zisserman,et al.  Counting in the Wild , 2016, ECCV.

[4]  Sharan Vaswani,et al.  Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, ArXiv.

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Qiang Qiu,et al.  Weakly Supervised Instance Segmentation Using Class Peak Response , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Alexandre Lacoste,et al.  Embedding Propagation: Smoother Manifold for Few-Shot Classification , 2020, ECCV.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Mark W. Schmidt,et al.  Where are the Blobs: Counting by Localization with Point Supervision , 2018, ECCV.

[11]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Yuhong Li,et al.  CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[14]  Saturnino Maldonado-Bascón,et al.  Extremely Overlapping Vehicle Counting , 2015, IbPRIA.

[15]  Ramprasaath R. Selvaraju,et al.  Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Mark W. Schmidt,et al.  Instance Segmentation with Point Supervision , 2019, ArXiv.

[17]  Shaogang Gong,et al.  Feature Mining for Localised Crowd Counting , 2012, BMVC.

[18]  Larry S. Davis,et al.  C-WSL: Count-guided Weakly Supervised Localization , 2017, ECCV.

[19]  Ilkay Öksüz,et al.  A Topological Loss Function for Deep-Learning Based Image Segmentation Using Persistent Homology , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[21]  Aaron Mishkin,et al.  Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.

[22]  Mark W. Schmidt,et al.  Where are the Masks: Instance Segmentation with Image-level Supervision , 2019, BMVC.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Daniel Oñoro-Rubio,et al.  Towards Perspective-Free Object Counting with Deep Learning , 2016, ECCV.

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[29]  F. Khan,et al.  Object Counting and Instance Segmentation With Image-Level Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[32]  Wenyu Liu,et al.  Weakly Supervised Region Proposal Network and Object Detection , 2018, ECCV.

[33]  Frederik Kunstner,et al.  Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search) , 2020, ArXiv.

[34]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[35]  Ming-Hsuan Yang,et al.  Weakly Supervised Object Localization with Progressive Domain Adaptation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).