Multilevel Image Coding with Hyperfeatures

Abstract Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant with good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics over scales larger than the local input patches. We present a multilevel visual representation that remedies this. The starting point is the notion that to detect object parts in images, in practice it often suffices to detect co-occurrences of more local object fragments. This can be formalized by coding image patches against a codebook of known fragments or a more general statistical model and locally histogramming the resulting labels to capture their co-occurrence statistics. Local patch descriptors are converted into somewhat less local histograms over label occurrences. The histograms are themselves local descriptor vectors so the process can be iterated to code ever larger assemblies of object parts and increasingly abstract or ‘semantic’ image properties. We call these higher-level descriptors “hyperfeatures”. We formulate the hyperfeature model and study its performance under several different image coding methods including k-means based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks.

[1]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[2]  P Perona,et al.  Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[3]  Peter Seitz,et al.  Robust classification of arbitrary object classes based on hierarchical spatial feature-matching , 1997, Machine Vision and Applications.

[4]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[8]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[9]  Alex Pentland,et al.  Probabilistic object recognition and localization , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Emanuele Trucco,et al.  Robust motion and correspondence of noisy 3-D point sets with missing data , 1999, Pattern Recognit. Lett..

[11]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[12]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised segmentation and image retrieval , 1999, Pattern Recognit. Lett..

[13]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Andrew Zisserman,et al.  Viewpoint invariant texture matching and wide baseline stereo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  Jitendra Malik,et al.  Recognizing objects in adversarial clutter: breaking a visual CAPTCHA , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  Cordelia Schmid,et al.  Affine-invariant local descriptors and neighborhood statistics for texture recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Andrew Zisserman,et al.  Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[20]  Wray L. Buntine,et al.  Is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction? , 2003, AISTATS.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[23]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[24]  N. Goodwin,et al.  Learning to Detect Objects in Images via a Sparse, Part-Based Representation , 2004 .

[25]  Cordelia Schmid,et al.  Weakly Supervised Learning of Visual Models and Its Application to Content-Based Retrieval , 2004, International Journal of Computer Vision.

[26]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[27]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[28]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[29]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[30]  Mario Fritz,et al.  On the Significance of Real-World Conditions for Material Classification , 2004, ECCV.

[31]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[33]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[34]  Erik G. Learned-Miller,et al.  Learning Hyper-Features for Visual Identification , 2004, NIPS.

[35]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[36]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[38]  Shimon Ullman,et al.  Feature hierarchies for object classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[39]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Topic Mixture Model for Document Representation , 2005 .

[42]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[44]  C. Schmid,et al.  Object Class Recognition Using Discriminative Local Features , 2005 .

[45]  Wray L. Buntine,et al.  Discrete Principal Component Analysis , 2005 .

[46]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[47]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[49]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).