Blocks That Shout: Distinctive Parts for Scene Classification

The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previous part-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of-the-art classification performance on this data.

[1]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Yali Amit,et al.  Generative Models for Labeling Multi-object Configurations in Images , 2006, Toward Category-Level Object Recognition.

[3]  Fei-Fei Li,et al.  Large Margin Learning of Upstream Scene Understanding Models , 2010, NIPS.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[8]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[9]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[11]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[13]  C. V. Jawahar,et al.  The truth about cats and dogs , 2011, 2011 International Conference on Computer Vision.

[14]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[15]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[16]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Fereshteh Sadeghi,et al.  Latent Pyramidal Regions for Recognizing Scenes , 2012, ECCV.

[19]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[20]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[21]  Tomasz Malisiewicz,et al.  A Gaussian Approximation of Feature Space for Fast Image Similarity , 2012 .

[22]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[23]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[24]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[28]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[29]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[31]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[32]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[35]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[36]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Michel Vidal-Naquet,et al.  A Fragment-Based Approach to Object Representation and Classification , 2001, IWVF.