Immediate, Scalable Object Category Detection

The objective of this work is object category detection in large scale image datasets in the manner of Video Google - an object category is specified by a HOG classifier template, and retrieval is immediate at run time. We make the following three contributions: (i) a new image representation based on mid-level discriminative patches, that is designed to be suited to immediate object category detection and inverted file indexing, (ii) a sparse representation of a HOG classifier using a set of mid-level discriminative classifier patches, and (iii) a fast method for spatial reranking images on their detections. We evaluate the detection method on the standard PASCAL VOC 2007 dataset, together with a 100K image subset of ImageNet, and demonstrate near state of the art detection performance at low ranks whilst maintaining immediate retrieval speeds. Applications are also demonstrated using an exemplar-SVM for pose matched retrieval.

[1]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[2]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[4]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Zisserman,et al.  Sparse kernel approximations for efficient classification and detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bastian Leibe,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[7]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[8]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[9]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[10]  Andrew Zisserman,et al.  Enhancing Exemplar SVMs using Part Level Transfer Regularization , 2012, BMVC.

[11]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[13]  Trevor Darrell,et al.  Discriminatively Activated Sparselets , 2013, ICML.

[14]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Iasonas Kokkinos,et al.  Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound , 2011, NIPS.

[16]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[18]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[20]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[21]  Trevor Darrell,et al.  Sparselet Models for Efficient Multiclass Object Detection , 2012, ECCV.

[22]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[23]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Andrew Zisserman,et al.  Multiple queries for large scale specific object retrieval , 2012, BMVC.

[25]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[28]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[29]  Jordi Gonzàlez,et al.  A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[30]  David A. Forsyth,et al.  Fast Template Evaluation with Vector Quantization , 2013, NIPS.

[31]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[34]  François Fleuret,et al.  Exact Acceleration of Linear Object Detectors , 2012, ECCV.

[35]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[37]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).