Learning explicit and implicit visual manifolds by information projection

Natural images have a vast amount of visual patterns distributed in a wide spectrum of subspaces of varying complexities and dimensions. Understanding the characteristics of these subspaces and their compositional structures is of fundamental importance for pattern modeling, learning and recognition. In this paper, we start with small image patches and define two types of atomic subspaces: explicit manifolds of low dimensions for structural primitives and implicit manifolds of high dimensions for stochastic textures. Then we present an information theoretical learning framework that derives common models for these manifolds through information projection, and study a manifold pursuit algorithm that clusters image patches into those atomic subspaces and ranks them according to their information gains. We further show how those atomic subspaces change over an image scaling process and how they are composed to form larger and more complex image patterns. Finally, we integrate the implicit and explicit manifolds to form a primal sketch model as a generic representation in early vision and to generate a hybrid image template representation for object category recognition in high level vision. The study of the mathematical structures in the image space sheds lights on some basic questions in human vision, such as atomic elements in visual perception, the perceptual metrics in various manifolds, and the perceptual transitions over image scales. This paper is based on the J.K. Aggarwal Prize lecture by the first author at the International Conference on Pattern Recognition, Tempa, FL. 2008.

[1]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[2]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[3]  D Sagi,et al.  Where practice makes perfect in texture discrimination: evidence for primary visual cortex plasticity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D. Mumford,et al.  Stochastic models for generic images , 2001 .

[5]  D. Ruderman The statistics of natural images , 1994 .

[6]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[7]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Song-Chun Zhu,et al.  Perceptual Scale-Space and Its Applications , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Song-Chun Zhu,et al.  Minimax Entropy Principle and Its Application to Texture Modeling , 1997, Neural Computation.

[10]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  B. Julesz Dialogues on Perception , 1994 .

[13]  Song-Chun Zhu,et al.  Primal sketch: Integrating structure and texture , 2007, Comput. Vis. Image Underst..

[14]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[15]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Song-Chun Zhu,et al.  From Information Scaling of Natural Images to Regimes of Statistical Models , 2007 .

[17]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  David Mumford,et al.  Statistics of natural images and models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  D. Geman,et al.  Invariant Statistics and Coding of Natural Microimages , 1998 .

[20]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[21]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[24]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[26]  Song-Chun Zhu,et al.  Prior Learning and Gibbs Reaction-Diffusion , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  William T. Freeman,et al.  What makes a good model of natural images? , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Song-Chun Zhu,et al.  Mapping Natural Image Patches by Explicit and Implicit Manifolds , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Song-Chun Zhu,et al.  Deformable Template As Active Basis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Song-Chun Zhu,et al.  What are Textons? , 2005, Int. J. Comput. Vis..

[32]  Kim Steenstrup Pedersen,et al.  The Nonlinear Statistics of High-Contrast Patches in Natural Images , 2003, International Journal of Computer Vision.

[33]  Song-Chun Zhu,et al.  Learning Active Basis Model for Object Detection and Recognition , 2010, International Journal of Computer Vision.

[34]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[35]  G. Matheron Random Sets and Integral Geometry , 1976 .

[36]  Brendan J. Frey,et al.  Transformed component analysis: joint estimation of spatial transformations and image components , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[37]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[38]  Song-Chun Zhu,et al.  Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo-Toward a 'Trichromacy' Theory of Texture , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).