Discovering and Aligning Discriminative Mid-level Features for Image Classification

This paper proposes a new algorithm for image recognition, which consists of (i) modeling categories as a set of distinctive parts that are discovered automatically, (ii) aligning them across images while learning their visual model, and, finally (iii) encode images as sets of part descriptors. The so-obtained parts are free of any appearance constraint and are optimized to allow the distinction between the categories to be recognized. The algorithm starts by extracting a set of random regions from the images of different classes, and, using a soft assign-like matching algorithm, simultaneously learns the model of each part and assigns image regions to the model's parts. Once the model of the category is trained, it can be used to classify new images by first finding image's regions similar to learned parts and encoding them by the fisher-on-parts encoding, which is another contribution of this paper. The proposed framework is experimentally validated on two publicly available datasets, on which state-of-the-art performance is obtained.

[1]  Theo Gevers,et al.  SuperPixel Based Angular Differences as a Mid-level Image Descriptor , 2014, 2014 22nd International Conference on Pattern Recognition.

[2]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[3]  Zhuowen Tu,et al.  Detecting Object Boundaries Using Low-, Mid-, and High-level Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[5]  Ali Farhadi,et al.  Building a dictionary of image fragments , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[8]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[9]  Derek Hoiem,et al.  Learning Collections of Part Models for Object Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Federico Girosi,et al.  Parallel and Deterministic Algorithms from MRFs: Surface Reconstruction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Eric Mjolsness,et al.  New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence , 1998, NIPS.

[12]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[13]  Subhransu Maji,et al.  Part Discovery from Partial Correspondence , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Joseph J. Lim,et al.  Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Devi Parikh Recognizing jumbled images: The role of local and global information in image classification , 2011, 2011 International Conference on Computer Vision.

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Gang Hua,et al.  Context aware topic model for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[20]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[21]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[22]  Deva Ramanan,et al.  Steerable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Shimon Ullman,et al.  Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[25]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[26]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[27]  Cordelia Schmid,et al.  Expanded Parts Model for Human Attribute and Action Recognition in Still Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[29]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[30]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[31]  Brendan J. Frey,et al.  Learning structural element patch models with hierarchical palettes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Alexei A. Efros,et al.  How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.

[33]  Frédéric Jurie,et al.  Learning Tree-structured Quantizers for Image Categorization , 2011, BMVC.

[34]  Frédéric Jurie,et al.  Improving Image Classification Using Semantic Attributes , 2012, International Journal of Computer Vision.

[35]  Tinne Tuytelaars,et al.  Effective Use of Frequent Itemset Mining for Image Classification , 2012, ECCV.

[36]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.