The Sparse Manifold Transform

We present a signal representation framework called the sparse manifold transform that combines key ideas from sparse coding, manifold learning, and slow feature analysis. It turns non-linear transformations in the primary sensory signal space into linear interpolations in a representational embedding space while maintaining approximate invertibility. The sparse manifold transform is an unsupervised and generative framework that explicitly and simultaneously models the sparse discreteness and low-dimensional manifold structure found in natural scenes. When stacked, it also models hierarchical composition. We provide a theoretical description of the transform and demonstrate properties of the learned representation on both synthetic data and natural videos.

[1]  J. Malo,et al.  V1 non-linear properties emerge from local-to-global non-linear ICA , 2006, Network.

[2]  Joshua B. Tenenbaum,et al.  Sparse multidimensional scaling using land-mark points , 2004 .

[3]  Bruno A. Olshausen,et al.  Learning sparse, overcomplete representations of time-varying natural images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[4]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[5]  Eero P. Simoncelli,et al.  Perceptual straightening of natural videos , 2019, Nature Neuroscience.

[6]  Valero Laparra,et al.  Density Modeling of Images using a Generalized Normalization Transformation , 2015, ICLR.

[7]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[8]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[9]  M. Bethge Factorial coding of natural images: how effective are linear models in removing higher-order dependencies? , 2006, Journal of the Optical Society of America. A, Optics, image science, and vision.

[10]  Song-Chun Zhu,et al.  Learning Active Basis Model for Object Detection and Recognition , 2010, International Journal of Computer Vision.

[11]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[12]  Aapo Hyvärinen,et al.  Bubbles: a unifying framework for low-level statistical properties of natural image sequences. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[13]  Richard E. Turner,et al.  A Structured Model of Video Reproduces Primary Visual Cortical Organisation , 2009, PLoS Comput. Biol..

[14]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[15]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[16]  P. Fldik,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Computation.

[17]  Gunnar E. Carlsson,et al.  Topological estimation using witness complexes , 2004, PBG.

[18]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[19]  Vin de Silva,et al.  On the Local Behavior of Spaces of Natural Images , 2007, International Journal of Computer Vision.

[20]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[21]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[22]  Bruno A. Olshausen,et al.  Highly overcomplete sparse coding , 2013, Electronic Imaging.

[23]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[24]  Joan Bruna,et al.  Signal recovery from Pooling Representations , 2013, ICML.

[25]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[26]  Aapo Hyv A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010 .

[27]  Yann LeCun,et al.  Learning to Linearize Under Uncertainty , 2015, NIPS.

[28]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[29]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[30]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[31]  Miguel Á. Carreira-Perpiñán,et al.  Locally Linear Landmarks for Large-Scale Manifold Learning , 2013, ECML/PKDD.

[32]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[33]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[34]  David D. Cox,et al.  Opinion TRENDS in Cognitive Sciences Vol.11 No.8 Untangling invariant object recognition , 2022 .

[35]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[36]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[37]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Kim Steenstrup Pedersen,et al.  The Nonlinear Statistics of High-Contrast Patches in Natural Images , 2003, International Journal of Computer Vision.

[40]  Joseph J. Atick,et al.  Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[41]  Kibok Lee,et al.  Towards Understanding the Invertibility of Convolutional Neural Networks , 2017, IJCAI.

[42]  Pietro Perona,et al.  Deformable kernels for early vision , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[44]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[45]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[46]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[47]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Jorge S. Marques,et al.  Selecting Landmark Points for Sparse Manifold Learning , 2005, NIPS.

[49]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[50]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[51]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[52]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[53]  Gerhard Krieger,et al.  The atoms of vision: Cartesian or polar? , 1999 .

[54]  Kedarnath P. Vilankar,et al.  Conjectures regarding the nonlinear geometry of visual neurons , 2016, Vision Research.

[55]  X. Huo,et al.  A Survey of Manifold-Based Learning Methods , 2007 .

[56]  Erkki Oja,et al.  A class of neural networks for independent component analysis , 1997, IEEE Trans. Neural Networks.

[57]  Aapo Hyvärinen,et al.  Learning Visual Spatial Pooling by Strong PCA Dimension Reduction , 2016, Neural Computation.

[58]  Garrison W. Cottrell,et al.  Efficient Visual Coding: From Retina To V2 , 2013, ICLR.

[59]  David J. Field,et al.  How Close Are We to Understanding V1? , 2005, Neural Computation.

[60]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[61]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[62]  Eero P. Simoncelli,et al.  Nonlinear image representation using divisive normalization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  D. Mumford,et al.  Pattern Theory: The Stochastic Analysis of Real-World Signals , 2010 .

[64]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Xinhua Zhang,et al.  A Deconvolutional Competitive Algorithm for Building Sparse Hierarchical Representations , 2016, BICT.

[66]  Bruno A. Olshausen,et al.  Learning Intermediate-Level Representations of Form and Motion from Natural Movies , 2012, Neural Computation.