Learning to represent visual input

One of the central problems in computational neuroscience is to understand how the object-recognition pathway of the cortex learns a deep hierarchy of nonlinear feature detectors. Recent progress in machine learning shows that it is possible to learn deep hierarchies without requiring any labelled data. The feature detectors are learned one layer at a time and the goal of the learning procedure is to form a good generative model of images, not to predict the class of each image. The learning procedure only requires the pairwise correlations between the activations of neuron-like processing units in adjacent layers. The original version of the learning procedure is derived from a quadratic ‘energy’ function but it can be extended to allow third-order, multiplicative interactions in which neurons gate the pairwise interactions between other neurons. A technique for factoring the third-order interactions leads to a learning module that again has a simple learning rule based on pairwise correlations. This module looks remarkably like modules that have been proposed by both biologists trying to explain the responses of neurons and engineers trying to create systems that can recognize objects.

[1]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2]  Berthold K. P. Horn Understanding Image Intensities , 1977, Artif. Intell..

[3]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[5]  Francis Crick,et al.  The function of dream sleep , 1983, Nature.

[6]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[7]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[8]  N. J. Cohen,et al.  Higher-Order Boltzmann Machines , 1986 .

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[11]  O. G. Selfridge,et al.  Pandemonium: a paradigm for learning , 1988 .

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[14]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[15]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[16]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[17]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[18]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[21]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[23]  Refractor Vision , 2000, The Lancet.

[24]  Eero P. Simoncelli,et al.  Image Denoising using Gaussian Scale Mixtures in the Wavelet Domain , 2002 .

[25]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[26]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[27]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[28]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[29]  Nicole C. Rust,et al.  Do We Know What the Early Visual System Does? , 2005, The Journal of Neuroscience.

[30]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[31]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[32]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[34]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[35]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[36]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.