A probabilistic model for recursive factorized image features

Layered representations for object recognition are important due to their increased invariance, biological plausibility, and computational benefits. However, most of existing approaches to hierarchical representations are strictly feedforward, and thus not well able to resolve local ambiguities. We propose a probabilistic model that learns and infers all layers of the hierarchy jointly. Specifically, we suggest a process of recursive probabilistic factorization, and present a novel generative model based on Latent Dirichlet Allocation to this end. The approach is tested on a standard recognition dataset, outperforming existing hierarchical approaches and demonstrating performance on par with current single-feature state-of-the-art models. We demonstrate two important properties of our proposed model: 1) adding an additional layer to the representation increases performance over the flat model; 2) a full Bayesian approach outperforms a feedforward implementation of the model.

[1]  Bill Triggs,et al.  Multilevel Image Coding with Hyperfeatures , 2008, International Journal of Computer Vision.

[2]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Long Zhu,et al.  Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion , 2008, ECCV.

[4]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[6]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[7]  Sanja Fidler,et al.  Similarity-based cross-layered hierarchical representation for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[11]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[12]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[13]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[14]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[15]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[17]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[18]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[19]  Sanja Fidler,et al.  Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[21]  Garrison W. Cottrell,et al.  Robust classification of objects, faces, and flowers using natural image statistics , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Trevor Darrell,et al.  An Additive Latent Feature Model for Transparent Object Recognition , 2009, NIPS.

[23]  Gustavo Deco,et al.  Computational neuroscience of vision , 2002 .

[24]  Anitha Pasupathy,et al.  Transformation of shape information in the ventral pathway , 2007, Current Opinion in Neurobiology.

[25]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[28]  S. Ullman Object recognition and segmentation by a fragment-based hierarchy , 2007, Trends in Cognitive Sciences.

[29]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[30]  Joachim M. Buhmann,et al.  Learning the Compositional Nature of Visual Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[33]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).