BING: Binarized normed gradients for objectness estimation at 300fps

Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1, 000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.

[1]  D. Lindsley Physiological psychology. , 1956, Annual review of psychology.

[2]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[3]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[4]  Hayit Greenspan,et al.  Finding Pictures of Objects in Large Collections of Images , 1996, Object Representation in Computer Vision.

[5]  M. Goldberg,et al.  The representation of visual salience in monkey parietal cortex , 1998, Nature.

[6]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[8]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[9]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[13]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[14]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[18]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[21]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[22]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Douglas Lanman,et al.  BiDi screen: a thin, depth-sensing LCD for 3D interaction using light fields , 2009, SIGGRAPH 2009.

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27]  Ralph R. Martin,et al.  A Shape‐Preserving Approach to Image Resizing , 2009, Comput. Graph. Forum.

[28]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Shi-Min Hu,et al.  RepFinder: finding approximately repeated scene elements for image editing , 2010, ACM Trans. Graph..

[33]  Shi-Min Hu,et al.  Popup: automatic paper architectures from 3D models , 2010, SIGGRAPH 2010.

[34]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[36]  Haibin Ling,et al.  Scale and object aware image retargeting for thumbnail browsing , 2011, 2011 International Conference on Computer Vision.

[37]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[38]  Hong Liu,et al.  Web-image driven best views of 3D shapes , 2011, The Visual Computer.

[39]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[40]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[41]  Jonathan Warrell,et al.  Proposal generation for object detection using cascaded ranking SVMs , 2011, CVPR 2011.

[42]  Stephen Lin,et al.  Semantic colorization with internet images , 2011, ACM Trans. Graph..

[43]  Hua Huang,et al.  Arcimboldo-like collage using internet images , 2011, ACM Trans. Graph..

[44]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  N. Mitra,et al.  Interactive Images: Cuboid Proxies for Smart Image Manipulation , 2012 .

[46]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[47]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Philip H. S. Torr,et al.  Efficient online structured output learning for keypoint-based object tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Yael Pritch,et al.  Saliency filters: Contrast based filtering for salient region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Nazli Ikizler-Cinbis,et al.  On Recognizing Actions in Still Images via Multiple Features , 2012, ECCV Workshops.

[51]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Shi-Min Hu,et al.  Data‐Driven Object Manipulation in Images , 2012, Comput. Graph. Forum.

[53]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[56]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[57]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[58]  Philip H. S. Torr,et al.  Salient Object Detection and Segmentation , 2013 .

[59]  Ralph R. Martin,et al.  Internet visual media processing: a survey with graphics and vision applications , 2013, The Visual Computer.

[60]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Shi-Min Hu,et al.  PoseShop: Human Image Database Construction and Personalized Content Synthesis , 2013, IEEE Transactions on Visualization and Computer Graphics.

[62]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Philip H. S. Torr,et al.  Approximate structured output learning for Constrained Local Models with application to real-time facial feature detection and tracking on low-power devices , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[64]  Vibhav Vineet,et al.  Efficient Salient Region Detection with Soft Image Abstraction , 2013, 2013 IEEE International Conference on Computer Vision.

[65]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[66]  Shi-Min Hu,et al.  SalientShape: group saliency in image collections , 2013, The Visual Computer.

[67]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[68]  Derek Hoiem,et al.  Category-Independent Object Proposals with Diverse Ranking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Carsten Rother,et al.  Dense Semantic Image Segmentation with Objects and Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Baolin Yin,et al.  Cracking BING and Beyond , 2014, BMVC.

[73]  James M. Rehg,et al.  RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Wei Chen,et al.  Actionness Ranking with Lattice Conditional Ordinal Random Fields , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[76]  Esa Rahtu,et al.  Generating Object Segmentation Proposals Using Global and Local Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[78]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[79]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[80]  Vibhav Vineet,et al.  ImageSpirit: Verbal Guided Image Parsing , 2013, ACM Trans. Graph..

[81]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[82]  Joachim Denzler,et al.  Active learning and discovery of object categories in the presence of unnameable instances , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Jean Ponce,et al.  Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[84]  Jiajun Wu,et al.  Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Jitendra Malik,et al.  DeepBox: Learning Objectness with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[86]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Vladlen Koltun,et al.  Learning to propose objects , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Huimin Ma,et al.  Improving object proposals with multi-thresholding straddling expansion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[90]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Ian D. Reid,et al.  gSLICr: SLIC superpixels at over 250Hz , 2015, ArXiv.

[92]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[93]  Cewu Lu,et al.  Contour Box: Rejecting Object Proposals without Explicit Closed Contours , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[94]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Nitish Srivastava,et al.  Exploiting Image-trained CNN Architectures for Unconstrained Video Classification , 2015, BMVC.

[97]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[98]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[99]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[100]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[102]  Shi-Min Hu,et al.  HFS: Hierarchical Feature Selection for Efficient Image Segmentation , 2016, ECCV.

[103]  Anton van den Hengel,et al.  Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  Xuelong Li,et al.  Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[105]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Neelima Chavali,et al.  Object-Proposal Evaluation Protocol is ‘Gameable’ , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).