Learning structural element patch models with hierarchical palettes

Image patches can be factorized into `shapelets' that describe segmentation patterns called structural elements (stels), and palettes that describe how to paint the shapelets. We introduce local palettes for patches, global palettes for entire images and universal palettes for image collections. Using a learned shapelet library, patches from a test image can be analyzed using a variational technique to produce an image descriptor that represents local shapes and colors separately. We show that the shapelet model performs better than SIFT, Gist and the standard stel method on Caltech28 and is very competitive with other methods on Caltech101.

[1]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[2]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[3]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  N. Jojic,et al.  Capturing image structure with probabilistic index maps , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Richard S. Zemel,et al.  Learning Parts-Based Representations of Data , 2006, J. Mach. Learn. Res..

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[10]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[11]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Brendan J. Frey,et al.  Stel component analysis: Modeling spatial correlations in image class structure , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Nebojsa Jojic,et al.  Object Recognition with Hierarchical Stel Models , 2010, ECCV.

[16]  Nebojsa Jojic,et al.  Structural epitome: a way to summarize one's visual experience , 2010, NIPS.

[17]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[20]  Christopher K. I. Williams,et al.  Factored Shapes and Appearances for Parts-based Object Understanding , 2011, BMVC.

[21]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[22]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.