Computer Vision – ECCV 2012

Spatial pyramid matching (SPM) based pooling has been the dominant choice for state-of-art image classification systems. In contrast, we propose a novel object-centric spatial pooling (OCP) approach, following the intuition that knowing the location of the object of interest can be useful for image classification. OCP consists of two steps: (1) inferring the location of the objects, and (2) using the location information to pool foreground and background features separately to form the image-level representation. Step (1) is particularly challenging in a typical classification setting where precise object location annotations are not available during training. To address this challenge, we propose a framework that learns object detectors using only image-level class labels, or so-called weak labels. We validate our approach on the challenging PASCAL07 dataset. Our learned detectors are comparable in accuracy with stateof-the-art weakly supervised detection methods. More importantly, the resulting OCP approach significantly outperforms SPM-based pooling in image classification.

[1]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[2]  Yung-Yu Chuang,et al.  A collaborative benchmark for region of interest detection algorithms , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[4]  Allan D. Jepson,et al.  Benchmarking Image Segmentation Algorithms , 2009, International Journal of Computer Vision.

[5]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[6]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[7]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition - Tangent Distance and Tangent Propagation , 2012, Neural Networks: Tricks of the Trade.

[8]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Qi Tian,et al.  Saliency Density Maximization for Object Detection and Localization , 2010, ACCV.

[10]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[11]  Christopher M. Masciocchi,et al.  Everyone knows what is interesting: salient locations which should be fixated. , 2009, Journal of vision.

[12]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Gabriela Csurka,et al.  A framework for visual saliency detection with applications to image thumbnailing , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[15]  Ronen Basri,et al.  Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Brendan J. Frey,et al.  Transformed component analysis: joint estimation of spatial transformations and image components , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[18]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Lihi Zelnik-Manor,et al.  Salient Edges: A Multi Scale Approach , 2010 .

[20]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[22]  Song Wang,et al.  Image-Segmentation Evaluation From the Perspective of Salient Object Extraction , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Zhi Liu,et al.  Efficient saliency detection based on gaussian models , 2011 .

[24]  Jing Shen,et al.  An Improved Computational Approach for Salient Region Detection , 2010, J. Comput..

[25]  Loong Fah Cheong,et al.  Active Visual Segmentation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Gérard G. Medioni,et al.  Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting , 2010, J. Mach. Learn. Res..

[27]  Nicu Sebe,et al.  Image saliency by isocentric curvedness and color , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[29]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[30]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[32]  James H. Elder,et al.  Design and perceptual validation of performance measures for salient object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[33]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  M. Maggioni,et al.  GEOMETRIC DIFFUSIONS AS A TOOL FOR HARMONIC ANALYSIS AND STRUCTURE DEFINITION OF DATA PART I: DIFFUSION MAPS , 2005 .

[36]  King Ngi Ngan,et al.  A Co-Saliency Model of Image Pairs , 2011, IEEE Transactions on Image Processing.

[37]  Hermann Ney,et al.  Deformation Models for Image Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[39]  Tilke Judd,et al.  Understanding and predicting where people look in images , 2011 .

[40]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[41]  Martin D. Levine,et al.  Saliency Detection Based on Frequency and Spatial Domain Analyses , 2011, BMVC.

[42]  Zhi Liu,et al.  Nonparametric saliency detection using kernel density estimation , 2010, 2010 IEEE International Conference on Image Processing.

[43]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[44]  Yupin Luo,et al.  Edge-based method for detecting salient objects , 2011 .

[45]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[46]  Michael Jones,et al.  Multidimensional Morphable Models: A Framework for Representing and Matching Object Classes , 2004, International Journal of Computer Vision.

[47]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[48]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[49]  Deepu Rajan,et al.  Random walks on graphs to model saliency in images , 2009, CVPR.

[50]  Song Wang,et al.  New benchmark for image segmentation evaluation , 2007, J. Electronic Imaging.

[51]  J. Weickert,et al.  Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[52]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[53]  Andrew Gilbert,et al.  Action recognition using Randomised Ferns , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[54]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[55]  Sabine Süsstrunk,et al.  Saliency detection for content-aware image resizing , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[56]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[57]  Patrice Y. Simard,et al.  Metrics and Models for Handwritten Character Recognition , 1998 .

[58]  Bernhard Schölkopf,et al.  Center-surround patterns emerge as optimal predictors for human saccade targets. , 2009, Journal of vision.

[59]  Hermann Ney,et al.  Adaptation in statistical pattern recognition using tangent vectors , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  C. Schmid,et al.  Description of Interest Regions with Center-Symmetric Local Binary Patterns , 2006, ICVGIP.

[61]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.