Learning Visual Spatial Pooling by Strong PCA Dimension Reduction

In visual modeling, invariance properties of visual cells are often explained by a pooling mechanism, in which outputs of neurons with similar selectivities to some stimulus parameters are integrated so as to gain some extent of invariance to other parameters. For example, the classical energy model of phase-invariant V1 complex cells pools model simple cells preferring similar orientation but different phases. Prior studies, such as independent subspace analysis, have shown that phase-invariance properties of V1 complex cells can be learned from spatial statistics of natural inputs. However, those previous approaches assumed a squaring nonlinearity on the neural outputs to capture energy correlation; such nonlinearity is arguably unnatural from a neurobiological viewpoint but hard to change due to its tight integration into their formalisms. Moreover, they used somewhat complicated objective functions requiring expensive computations for optimization. In this study, we show that visual spatial pooling can be learned in a much simpler way using strong dimension reduction based on principal component analysis. This approach learns to ignore a large part of detailed spatial structure of the input and thereby estimates a linear pooling matrix. Using this framework, we demonstrate that pooling of model V1 simple cells learned in this way, even with nonlinearities other than squaring, can reproduce standard tuning properties of V1 complex cells. For further understanding, we analyze several variants of the pooling model and argue that a reasonable pooling can generally be obtained from any kind of linear transformation that retains several of the first principal components and suppresses the remaining ones. In particular, we show how the classic Wiener filtering theory leads to one such variant.

[1]  Matthias Bethge,et al.  Slowness and Sparseness Have Diverging Effects on Complex Cell Learning , 2014, PLoS Comput. Biol..

[2]  A. B. Bonds,et al.  Classifying simple and complex cells on the basis of response modulation , 1991, Vision Research.

[3]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[4]  Aapo Hyvärinen,et al.  A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images , 2001, Vision Research.

[5]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[6]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[7]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[8]  Michael S. Lewicki,et al.  Emergence of complex cell properties by learning to generalize in natural scenes , 2009, Nature.

[9]  Brian Lau,et al.  Computational subunits of visual cortical neurons revealed by artificial neural networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[11]  D. G. Albrecht,et al.  Spatial frequency selectivity of cells in macaque visual cortex , 1982, Vision Research.

[12]  Aapo Hyv Fast and Robust Fixed-Point Algorithms for Independent Component Analysis , 1999 .

[13]  W. Hashimoto Quadratic forms in natural images , 2003, Network.

[14]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[15]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[16]  M. Lewicki,et al.  Learning higher-order structures in natural images , 2003, Network.

[17]  Michael S. Lewicki,et al.  Characterization of Minimum Error Linear Coding with Sensory and Neural Noise , 2011, Neural Computation.

[18]  Christoph Kayser,et al.  Learning the invariance properties of complex cells from their responses to natural stimuli , 2002, The European journal of neuroscience.

[19]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[20]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[21]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[22]  Aapo Hyv A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010 .

[23]  Teuvo Kohonen,et al.  Correlation Matrix Memories , 1972, IEEE Transactions on Computers.

[24]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[25]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[26]  Julian Eggert,et al.  Learning viewpoint invariant object representations using a temporal coherence principle , 2005, Biological Cybernetics.

[27]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[30]  Konrad P. Körding,et al.  Learning the Nonlinearity of Neurons from Natural Visual Stimuli , 2003, Neural Computation.

[31]  I. Ohzawa,et al.  Neural mechanisms for processing binocular information I. Simple cells. , 1999, Journal of neurophysiology.

[32]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[33]  R. L. Valois,et al.  The orientation and direction selectivity of cells in macaque visual cortex , 1982, Vision Research.

[34]  Aapo Hyvärinen,et al.  A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010, Neural Computation.

[35]  David D. Cox,et al.  'Breaking' position-invariant object recognition , 2005, Nature Neuroscience.

[36]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[37]  Aapo Hyvärinen,et al.  Simple-Cell-Like Receptive Fields Maximize Temporal Coherence in Natural Video , 2003, Neural Computation.

[38]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[39]  Aapo Hyvärinen,et al.  A Hierarchical Statistical Model of Natural Images Explains Tuning Properties in V2 , 2015, The Journal of Neuroscience.

[40]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[41]  Laurenz Wiskott,et al.  Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.