论文信息 - The Sparse Manifold Transform

The Sparse Manifold Transform

We present a signal representation framework called the sparse manifold transform that combines key ideas from sparse coding, manifold learning, and slow feature analysis. It turns non-linear transformations in the primary sensory signal space into linear interpolations in a representational embedding space while maintaining approximate invertibility. The sparse manifold transform is an unsupervised and generative framework that explicitly and simultaneously models the sparse discreteness and low-dimensional manifold structure found in natural scenes. When stacked, it also models hierarchical composition. We provide a theoretical description of the transform and demonstrate properties of the learned representation on both synthetic data and natural videos.

Bruno A. Olshausen | Yubei Chen | Dylan M. Paiton | B. Olshausen | Yubei Chen

[1] J. Malo,et al. V1 non-linear properties emerge from local-to-global non-linear ICA , 2006, Network.

[2] Joshua B. Tenenbaum,et al. Sparse multidimensional scaling using land-mark points , 2004 .

[3] Bruno A. Olshausen,et al. Learning sparse, overcomplete representations of time-varying natural images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[4] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[5] Eero P. Simoncelli,et al. Perceptual straightening of natural videos , 2019, Nature Neuroscience.

[6] Valero Laparra,et al. Density Modeling of Images using a Generalized Normalization Transformation , 2015, ICLR.

[7] Aapo Hyvärinen,et al. Topographic Independent Component Analysis , 2001, Neural Computation.

[8] D J Field,et al. Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[9] M. Bethge. Factorial coding of natural images: how effective are linear models in removing higher-order dependencies? , 2006, Journal of the Optical Society of America. A, Optics, image science, and vision.

[10] Song-Chun Zhu,et al. Learning Active Basis Model for Object Detection and Recognition , 2010, International Journal of Computer Vision.

[11] Edward H. Adelson,et al. Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[12] Aapo Hyvärinen,et al. Bubbles: a unifying framework for low-level statistical properties of natural image sequences. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[13] Richard E. Turner,et al. A Structured Model of Video Reproduces Primary Visual Cortical Organisation , 2009, PLoS Comput. Biol..

[14] Richard G. Baraniuk,et al. Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[15] John M. Lee. Introduction to Smooth Manifolds , 2002 .

[16] P. Fldik,et al. Learning Invariance from Transformation Sequences , 1991, Neural Computation.

[17] Gunnar E. Carlsson,et al. Topological estimation using witness complexes , 2004, PBG.

[18] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[19] Vin de Silva,et al. On the Local Behavior of Spaces of Natural Images , 2007, International Journal of Computer Vision.

[20] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[21] Pierre Vandergheynst,et al. Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[22] Bruno A. Olshausen,et al. Highly overcomplete sparse coding , 2013, Electronic Imaging.

[23] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[24] Joan Bruna,et al. Signal recovery from Pooling Representations , 2013, ICML.

[25] Tai Sing Lee,et al. Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[26] Aapo Hyv. A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010 .

[27] Yann LeCun,et al. Learning to Linearize Under Uncertainty , 2015, NIPS.

[28] Aapo Hyvärinen,et al. Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[29] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[30] René Vidal,et al. Sparse Manifold Clustering and Embedding , 2011, NIPS.

[31] Miguel Á. Carreira-Perpiñán,et al. Locally Linear Landmarks for Large-Scale Manifold Learning , 2013, ECML/PKDD.

[32] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[33] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[34] David D. Cox,et al. Opinion TRENDS in Cognitive Sciences Vol.11 No.8 Untangling invariant object recognition , 2022 .

[35] Geoffrey E. Hinton,et al. Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[36] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[37] Thomas Brox,et al. Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39] Kim Steenstrup Pedersen,et al. The Nonlinear Statistics of High-Contrast Patches in Natural Images , 2003, International Journal of Computer Vision.

[40] Joseph J. Atick,et al. Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[41] Kibok Lee,et al. Towards Understanding the Invertibility of Convolutional Neural Networks , 2017, IJCAI.

[42] Pietro Perona,et al. Deformable kernels for early vision , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[44] Vighnesh Birodkar,et al. Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[45] D. Ruderman,et al. Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[46] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[47] Edward H. Adelson,et al. The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[48] Jorge S. Marques,et al. Selecting Landmark Points for Sparse Manifold Learning , 2005, NIPS.

[49] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[50] David L Donoho,et al. Compressed sensing , 2006, IEEE Transactions on Information Theory.

[51] Joseph J. Atick,et al. What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[52] Pascal Frossard,et al. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[53] Gerhard Krieger,et al. The atoms of vision: Cartesian or polar? , 1999 .

[54] Kedarnath P. Vilankar,et al. Conjectures regarding the nonlinear geometry of visual neurons , 2016, Vision Research.

[55] X. Huo,et al. A Survey of Manifold-Based Learning Methods , 2007 .

[56] Erkki Oja,et al. A class of neural networks for independent component analysis , 1997, IEEE Trans. Neural Networks.

[57] Aapo Hyvärinen,et al. Learning Visual Spatial Pooling by Strong PCA Dimension Reduction , 2016, Neural Computation.

[58] Garrison W. Cottrell,et al. Efficient Visual Coding: From Retina To V2 , 2013, ICLR.

[59] David J. Field,et al. How Close Are We to Understanding V1? , 2005, Neural Computation.

[60] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[61] William T. Freeman,et al. Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[62] Eero P. Simoncelli,et al. Nonlinear image representation using divisive normalization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[63] D. Mumford,et al. Pattern Theory: The Stochastic Analysis of Real-World Signals , 2010 .

[64] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65] Xinhua Zhang,et al. A Deconvolutional Competitive Algorithm for Building Sparse Hierarchical Representations , 2016, BICT.

[66] Bruno A. Olshausen,et al. Learning Intermediate-Level Representations of Form and Motion from Natural Movies , 2012, Neural Computation.