Class-Agnostic Counting

Nearly all existing counting methods are designed for a specific object class. Our work, however, aims to create a counting model able to count any class of object. To achieve this goal, we formulate counting as a matching problem, enabling us to exploit the image self-similarity property that naturally exists in object counting problems. We make the following three contributions: first, a Generic Matching Network (GMN) architecture that can potentially count any object in a class-agnostic manner; second, by reformulating the counting problem as one of matching objects, we can take advantage of the abundance of video data labeled for tracking, which contains natural repetitions suitable for training a counting model. Such data enables us to train the GMN. Third, to customize the GMN to different user requirements, an adapter module is used to specialize the model with minimal effort, i.e. using a few labeled examples, and adapting only a small fraction of the trained parameters. This is a form of few-shot learning, which is practical for domains where labels are limited due to requiring expert knowledge (e.g. microbiology). We demonstrate the flexibility of our method on a diverse set of existing counting benchmarks: specifically cells, cars, and human crowds. The model achieves competitive performance on cell and crowd counting datasets, and surpasses the state-of-the-art on the car dataset using only three training images. When training on the entire dataset, the proposed method outperforms all previous methods by a large margin.

[1]  Pekka Ruusuvuori,et al.  Computational Framework for Simulating Fluorescence Microscope Images With Cell Populations , 2007, IEEE Transactions on Medical Imaging.

[2]  Haroon Idrees,et al.  Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds , 2018, ECCV.

[3]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tommy W. S. Chow,et al.  A neural-based crowd estimation by hybrid global learning algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Ullrich Köthe,et al.  Learning to count with regression forest and structured labels , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[8]  Hieu Le,et al.  Iterative Crowd Counting , 2018, ECCV.

[9]  Ognjen Arandjelovic,et al.  Crowd Detection from Still Images , 2008, BMVC.

[10]  Andrew Zisserman,et al.  Counting in the Wild , 2016, ECCV.

[11]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[12]  Hai Tao,et al.  A Viewpoint Invariant Approach for Crowd Counting , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[13]  William T. Freeman,et al.  Best-Buddies Similarity for robust template matching , 2015, CVPR.

[14]  Shenghua Gao,et al.  Single-Image Crowd Counting via Multi-Column Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[16]  Andrew Zisserman,et al.  Microscopy cell counting with fully convolutional regression networks , 2015 .

[17]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Julian Yarkony,et al.  Cell Detection and Segmentation Using Correlation Clustering , 2014, MICCAI.

[21]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[22]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[25]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Noel E. O'Connor,et al.  People, Penguins and Petri Dishes: Adapting Object Counting Models to New Visual Domains and Object Types Without Forgetting , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Andrew Zisserman,et al.  Interactive Object Counting , 2014, ECCV.

[29]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Jitendra Malik,et al.  Detecting, localizing and grouping repeated scene elements from an image , 1996, ECCV.

[32]  Andrea Vedaldi,et al.  Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  A. Marana,et al.  Estimation of crowd density using image processing , 1997 .

[34]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[35]  Wesam A. Sakla,et al.  A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning , 2016, ECCV.

[36]  Andrew Zisserman,et al.  Detecting overlapping instances in microscopy images using extremal region trees , 2016, Medical Image Anal..

[37]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2012, IEEE Trans. Pattern Anal. Mach. Intell..