论文信息 - BING: Binarized normed gradients for objectness estimation at 300fps

BING: Binarized normed gradients for objectness estimation at 300fps

Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1, 000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.

[1] D. Lindsley. Physiological psychology. , 1956, Annual review of psychology.

[2] S Ullman,et al. Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[3] R. Desimone,et al. Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[4] Hayit Greenspan,et al. Finding Pictures of Objects in Large Collections of Images , 1996, Object Representation in Computer Vision.

[5] M. Goldberg,et al. The representation of visual salience in monkey parietal cortex , 1998, Nature.

[6] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7] HongJiang Zhang,et al. Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[8] J. Wolfe,et al. What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[9] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[10] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[12] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[13] Pietro Perona,et al. Graph-Based Visual Saliency , 2006, NIPS.

[14] Liqing Zhang,et al. Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Nanning Zheng,et al. Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17] Daphne Koller,et al. Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[18] Christoph H. Lampert,et al. Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Subhransu Maji,et al. Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[21] Shi-Min Hu,et al. Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[22] Sabine Süsstrunk,et al. Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24] Cordelia Schmid,et al. Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25] Douglas Lanman,et al. BiDi screen: a thin, depth-sensing LCD for 3D interaction using light fields , 2009, SIGGRAPH 2009.

[26] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27] Ralph R. Martin,et al. A Shape‐Preserving Approach to Image Resizing , 2009, Comput. Graph. Forum.

[28] Andrew Zisserman,et al. Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Thomas Deselaers,et al. What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32] Shi-Min Hu,et al. RepFinder: finding approximately repeated scene elements for image editing , 2010, ACM Trans. Graph..

[33] Shi-Min Hu,et al. Popup: automatic paper architectures from 3D models , 2010, SIGGRAPH 2010.

[34] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35] Derek Hoiem,et al. Category Independent Object Proposals , 2010, ECCV.

[36] Haibin Ling,et al. Scale and object aware image retargeting for thumbnail browsing , 2011, 2011 International Conference on Computer Vision.

[37] Koen E. A. van de Sande,et al. Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[38] Hong Liu,et al. Web-image driven best views of 3D shapes , 2011, The Visual Computer.

[39] Matthew B. Blaschko,et al. Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[40] Shi-Min Hu,et al. Global contrast based salient region detection , 2011, CVPR 2011.

[41] Jonathan Warrell,et al. Proposal generation for object detection using cascaded ranking SVMs , 2011, CVPR 2011.

[42] Stephen Lin,et al. Semantic colorization with internet images , 2011, ACM Trans. Graph..

[43] Hua Huang,et al. Arcimboldo-like collage using internet images , 2011, ACM Trans. Graph..

[44] Jitendra Malik,et al. Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45] N. Mitra,et al. Interactive Images: Cuboid Proxies for Smart Image Manipulation , 2012 .

[46] Cristian Sminchisescu,et al. Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[47] Thomas Deselaers,et al. Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Philip H. S. Torr,et al. Efficient online structured output learning for keypoint-based object tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Yael Pritch,et al. Saliency filters: Contrast based filtering for salient region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Nazli Ikizler-Cinbis,et al. On Recognizing Actions in Still Images via Multiple Features , 2012, ECCV Workshops.

[51] Shih-Fu Chang,et al. Mobile product search with Bag of Hash Bits and boundary reranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Shi-Min Hu,et al. Data‐Driven Object Manipulation in Images , 2012, Comput. Graph. Forum.

[53] Pascal Fua,et al. SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Cristian Sminchisescu,et al. CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Frédo Durand,et al. A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[56] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[57] Santiago Manen,et al. Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[58] Philip H. S. Torr,et al. Salient Object Detection and Segmentation , 2013 .

[59] Ralph R. Martin,et al. Internet visual media processing: a survey with graphics and vision applications , 2013, The Visual Computer.

[60] Li Xu,et al. Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[61] Shi-Min Hu,et al. PoseShop: Human Image Database Construction and Personalized Content Synthesis , 2013, IEEE Transactions on Visualization and Computer Graphics.

[62] Jonathon Shlens,et al. Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Philip H. S. Torr,et al. Approximate structured output learning for Constrained Local Models with application to real-time facial feature detection and tracking on low-power devices , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[64] Vibhav Vineet,et al. Efficient Salient Region Detection with Soft Image Abstraction , 2013, 2013 IEEE International Conference on Computer Vision.

[65] Ali Borji,et al. Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[66] Shi-Min Hu,et al. SalientShape: group saliency in image collections , 2013, The Visual Computer.

[67] Ming Yang,et al. Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[68] Derek Hoiem,et al. Category-Independent Object Proposals with Diverse Ranking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[70] Carsten Rother,et al. Dense Semantic Image Segmentation with Objects and Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[71] James M. Rehg,et al. The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[72] Baolin Yin,et al. Cracking BING and Beyond , 2014, BMVC.

[73] James M. Rehg,et al. RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[74] Wei Chen,et al. Actionness Ranking with Lattice Conditional Ordinal Random Fields , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[75] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[76] Esa Rahtu,et al. Generating Object Segmentation Proposals Using Global and Local Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[77] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[78] 智一吉田,et al. Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[79] Vladlen Koltun,et al. Geodesic Object Proposals , 2014, ECCV.

[80] Vibhav Vineet,et al. ImageSpirit: Verbal Guided Image Parsing , 2013, ACM Trans. Graph..

[81] Yong Jae Lee,et al. Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[82] Joachim Denzler,et al. Active learning and discovery of object categories in the presence of unnameable instances , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83] Jean Ponce,et al. Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[84] Jiajun Wu,et al. Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85] Jitendra Malik,et al. DeepBox: Learning Objectness with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[86] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87] Vladlen Koltun,et al. Learning to propose objects , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88] Huimin Ma,et al. Improving object proposals with multi-thresholding straddling expansion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[90] Cordelia Schmid,et al. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91] Ian D. Reid,et al. gSLICr: SLIC superpixels at over 250Hz , 2015, ArXiv.

[92] Ali Borji,et al. Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[93] Cewu Lu,et al. Contour Box: Rejecting Object Proposals without Explicit Closed Contours , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[94] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[96] Nitish Srivastava,et al. Exploiting Image-trained CNN Architectures for Unconstrained Video Classification , 2015, BMVC.

[97] Abhinav Gupta,et al. Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[98] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[99] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[100] Bingbing Ni,et al. HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[102] Shi-Min Hu,et al. HFS: Hierarchical Feature Selection for Efficient Image Segmentation , 2016, ECCV.

[103] Anton van den Hengel,et al. Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104] Xuelong Li,et al. Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[105] Bernt Schiele,et al. What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106] Neelima Chavali,et al. Object-Proposal Evaluation Protocol is ‘Gameable’ , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).