Bilinear Sparse Coding for Invariant Vision

Recent algorithms for sparse coding and independent component analysis (ICA) have demonstrated how localized features can be learned from natural images. However, these approaches do not take image transformations into account. We describe an unsupervised algorithm for learning both localized features and their transformations directly from images using a sparse bilinear generative model. We show that from an arbitrary set of natural images, the algorithm produces oriented basis filters that can simultaneously represent features in an image and their transformations. The learned generative model can be used to translate features to different locations, thereby reducing the need to learn the same feature at multiple locations, a limitation of previous approaches to sparse coding and ICA. Our results suggest that by explicitly modeling the interaction between local image features and their transformations, the sparse bilinear approach can provide a basis for achieving transformation-invariant vision.

[1]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[2]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[3]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[4]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[5]  D C Van Essen,et al.  Shifter circuits: a computational strategy for dynamic aspects of visual processing. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[8]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[9]  Rajesh P. N. Rao,et al.  A Bilinear Model for Sparse Coding , 2002, NIPS.

[10]  Rajesh P. N. Rao,et al.  Development of localized oriented receptive fields by learning a translation-invariant code for natural images. , 1998, Network.

[11]  Rajesh P. N. Rao,et al.  Learning Lie Groups for Invariant Visual Perception , 1998, NIPS.

[12]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[13]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[14]  Peter Dayan,et al.  Images, Frames, and Connectionist Hierarchies , 2006, Neural Computation.

[15]  Peter Földiák,et al.  Sparse coding in the primate cortex , 1998 .

[16]  Richard E. Turner,et al.  A Maximum-Likelihood Interpretation for Slow Feature Analysis , 2007, Neural Computation.

[17]  David J. Field,et al.  How Close Are We to Understanding V1? , 2005, Neural Computation.

[18]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[19]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[20]  Takayuki Ito,et al.  Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[22]  Bruno A. Olshausen,et al.  A multiscale dynamic routing circuit for forming size- and position-invariant object representations , 1995, Journal of Computational Neuroscience.

[23]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[24]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  Laurenz Wiskott,et al.  How Does Our Visual System Achieve Shift and Size Invariance , 2004 .

[27]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[28]  Heiko Wersing,et al.  Combining Reconstruction and Discrimination with Class-Specific Sparse Coding , 2007, Neural Computation.

[29]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .