Salient Object Subitizing

We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.

[1]  J. B.,et al.  The Power of Numerical Discrimination , 1871, Nature.

[2]  E. L. Kaufman,et al.  The discrimination of visual number. , 1949, The American journal of psychology.

[3]  F. Campbell,et al.  The Magic Number 4 ± 0: A New Look at Visual Numerosity Judgements , 1976, Perception.

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  G. Mandler,et al.  Subitizing: an analysis of its component processes. , 1982, Journal of experimental psychology. General.

[6]  H. Davis,et al.  Numerical competence in animals: Definitional issues, current evidence, and a new research agenda , 1988, Behavioral and Brain Sciences.

[7]  E. J. Capaldi,et al.  The Development of numerical competence : animal and human models , 1993 .

[8]  Z. Pylyshyn,et al.  Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. , 1994, Psychological review.

[9]  S. Dehaene,et al.  The Number Sense: How the Mind Creates Mathematics. , 1998 .

[10]  D. Clements Subitizing: What Is It? Why Teach It?. , 1999 .

[11]  Dwi Anoraganingrum,et al.  Cell segmentation with median filter and mathematical morphology operation , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[12]  R. Rafal,et al.  A systematic study of visual extinction. Between- and within-field deficits of attention in hemispatial neglect. , 2000, Brain : a journal of neurology.

[13]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  S. Dehaene,et al.  FROM NUMBER NEURONS TO MENTAL ARITHMETIC : THE , 2003 .

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[19]  Kannappan Palaniappan,et al.  Cell Segmentation Using Coupled Level Sets and Graph-Vertex Coloring , 2006, MICCAI.

[20]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[21]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Nuno Vasconcelos,et al.  Privacy preserving crowd monitoring: Counting people without people models or tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Deepu Rajan,et al.  Random walks on graphs to model saliency in images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[28]  J. Tautz,et al.  Number-Based Visual Generalisation in the Honeybee , 2009, PloS one.

[29]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[33]  Michael Goesele,et al.  Back to the Future: Learning Shape Models from 3D CAD Data , 2010, BMVC.

[34]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[36]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[37]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[38]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[39]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[40]  Jian Sun,et al.  Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[41]  Hongbin Zha,et al.  Salient object detection for searched web images via global saliency , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Marco Zorzi,et al.  Emergence of a 'visual number sense' in hierarchical generative models , 2012, Nature Neuroscience.

[43]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Hans J Gross,et al.  The magical number four: A biological, historical and mythological enigma , 2012, Communicative & integrative biology.

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  Ying Wu,et al.  A unified approach to salient object detection via low rank matrix recovery , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Adrien Descamps,et al.  Counting People in the Crowd Using a Generic Head Detector , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[48]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[49]  Tao Xiang,et al.  Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  James L. McClelland,et al.  Progressive Development of the Number Sense in a Deep Neural Network , 2013, CogSci.

[51]  David A. Clausi,et al.  Existence Detection of Objects in Images for Robot Vision Using Saliency Histogram Features , 2013, 2013 International Conference on Computer and Robot Vision.

[52]  Shao-Wu Zhang,et al.  Numerical Cognition in Bees and Other Insects , 2013, Front. Psychol..

[53]  Kate Saenko,et al.  From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains , 2014, BMVC.

[54]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Jiwon Choi,et al.  Determining the Existence of Objects in an Image and Its Application to Image Thumbnailing , 2014, IEEE Signal Processing Letters.

[56]  Andrew Zisserman,et al.  Interactive Object Counting , 2014, ECCV.

[57]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[59]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[60]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[61]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[62]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Vicente Ordonez,et al.  ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.

[64]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Kristen Grauman,et al.  Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior , 2014, Mobile Cloud Visual Media Computing.

[66]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[67]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[68]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[69]  Abe D. Hofman,et al.  The role of pattern recognition in children's exact enumeration of small numbers. , 2014, The British journal of developmental psychology.

[70]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[71]  Zhe L. Lin,et al.  Distance Encoded Product Quantization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Margrit Betke,et al.  Salient Object Subitizing , 2015, CVPR.

[73]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[74]  Radomír Mech,et al.  Minimum Barrier Salient Object Detection at 80 FPS , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[75]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[76]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Kate Saenko,et al.  Learning Deep Object Detectors from 3D Models , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[78]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[80]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[81]  Andrew B. Whinston,et al.  Content Complexity, Similarity, and Consistency in Social Media: A Deep Learning Approach , 2016 .

[82]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[83]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[84]  Radomír Mech,et al.  Unconstrained Salient Object Detection via Proposal Subset Optimization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Kristen Grauman,et al.  Visual Question: Predicting If a Crowd Will Agree on the Answer , 2016, ArXiv.

[86]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.