Feature Extraction Through LOCOCODE

Low-complexity coding and decoding (LOCOCODE) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods, it explicitly takes into account the information-theoretic complexity of the code generator. It computes lococodes that convey information about the input data and can be computed and decoded by low-complexity mappings. We implement LOCOCODE by training autoassociators with flat minimum search, a recent, general method for discovering low-complexity neural nets. It turns out that this approach can unmix an unknown number of independent data sources by extracting a minimal number of low-complexity features necessary for representing the data. Experiments show that unlike codes obtained with standard autoencoders, lococodes are based on feature detectors, never unstructured, usually sparse, and sometimes factorial or local (depending on statistical properties of the data). Although LOCOCODE is not explicitly designed to enforce sparse or factorial codes, it extracts optimal codes for difficult versions of the bars benchmark problem, whereas independent component analysis (ICA) and principal component analysis (PCA) do not. It produces familiar, biologically plausible feature detectors when applied to real-world images and codes with fewer bits per pixel than ICA and PCA. Unlike ICA, it does not need to know the number of independent sources. As a preprocessor for a vowel recognition benchmark problem, it sets the stage for excellent classification performance. Our results reveal an interesting, previously ignored connection between two important fields: regularizer research and ICA-related research. They may represent a first step toward unification of regularization and unsupervised learning.

[1]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[3]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[4]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[5]  Horace Barlow,et al.  Understanding Natural Vision , 1983 .

[6]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[7]  David Zipser,et al.  Feature discovery by competitive learning , 1986 .

[8]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[9]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[10]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[11]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[12]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[13]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[14]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[15]  M. Mozer Discovering Discrete Distributed Representations with Iterative Competitive Learning , 1990, NIPS 1990.

[16]  Suzanna Becker,et al.  Unsupervised Learning Procedures for Neural Networks , 1991, Int. J. Neural Syst..

[17]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[18]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[19]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[20]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992 .

[21]  Günther Palm,et al.  On the Information Storage Capacity of Local Learning Rules , 1992, Neural Computation.

[22]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[23]  Garrison W. Cottrell,et al.  Non-Linear Dimensionality Reduction , 1992, NIPS.

[24]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[25]  Geoffrey E. Hinton,et al.  Developing Population Codes by Minimizing Description Length , 1993, NIPS.

[26]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[27]  Eric Saund,et al.  Unsupervised Learning of Mixtures of Multiple Causes in Binary Data , 1993, NIPS.

[28]  Peter D. Turney Exploiting Context When Learning to Classify , 1993, ECML.

[29]  A. Norman Redlich,et al.  Redundancy Reduction as a Strategy for Unsupervised Learning , 1993, Neural Computation.

[30]  Jürgen Schmidhuber,et al.  Discovering Predictable Classifications , 1993, Neural Computation.

[31]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[32]  Jürgen Schmidhuber,et al.  Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[33]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[34]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[35]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[36]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[37]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[38]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[39]  Gustavo Deco,et al.  Nonlinear higher-order statistical decorrelation by volume-conserving neural architectures , 1995, Neural Networks.

[40]  R. Zemel,et al.  Learning sparse multiple cause models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[41]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[42]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[43]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[44]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Zhaoping Li,et al.  A Theory of the Visual Motion Coding in the Primary Visual Cortex , 1996, Neural Computation.

[46]  Jürgen Schmidhuber,et al.  Semilinear Predictability Minimization Produces Well-Known Feature Detectors , 1996, Neural Computation.

[47]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[48]  Joshua B. Tenenbaum,et al.  Separating Style and Content , 1996, NIPS.

[49]  Corso Elvezia Low-Complexity Coding and Decoding , 1997 .

[50]  Jürgen Schmidhuber,et al.  Unsupervised Coding with LOCOCODE , 1997, ICANN.

[51]  Jürgen Schmidhuber,et al.  Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[52]  Jürgen Schmidhuber,et al.  Low-Complexity Art , 2017 .

[53]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[54]  N. Schraudolph On Centering Neural Network Weight Updates ? , 1997 .

[55]  Néstor Parga,et al.  Redundancy Reduction and Independent Component Analysis: Conditions on Cumulants and Adaptive Approaches , 1997, Neural Computation.

[56]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[57]  Petteri Pajunen,et al.  Blind source separation using algorithmic information theory , 1998, Neurocomputing.

[58]  Jürgen Schmidhuber,et al.  LOCOCODE versus PCA and ICA , 1998 .

[59]  Peter Földiák,et al.  Sparse coding in the primate cortex , 1998 .