Learning with hidden variables

Learning and inferring features that generate sensory input is a task continuously performed by cortex. In recent years, novel algorithms and learning rules have been proposed that allow neural network models to learn such features from natural images, written text, audio signals, etc. These networks usually involve deep architectures with many layers of hidden neurons. Here we review recent advancements in this area emphasizing, amongst other things, the processing of dynamical inputs by networks with hidden nodes and the role of single neuron models. These points and the questions they arise can provide conceptual advancements in understanding of learning in the cortex and the relationship between machine learning approaches to learning with hidden nodes and those in cortical circuits.

[1]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[2]  D. Mackay,et al.  Towards an information-flow model of human behaviour. , 1956, British journal of psychology.

[3]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[5]  Sompolinsky,et al.  Spin-glass models of neural networks. , 1985, Physical review. A, General physics.

[6]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[7]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[8]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Manfred Opper,et al.  Inferring hidden states in a random kinetic Ising model: replica analysis , 2014, 1405.4164.

[10]  Boes,et al.  Statistical mechanics for networks of graded-response neurons. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[11]  Paul Miller,et al.  Spiking neuron network Helmholtz machine , 2015, Front. Comput. Neurosci..

[12]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[13]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[14]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[15]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[16]  Yoshua Bengio,et al.  Target Propagation , 2015, ICLR.

[17]  Hilbert J. Kappen,et al.  Efficient Learning in Boltzmann Machines Using Linear Response Theory , 1998, Neural Computation.

[18]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[19]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[20]  D. Hubel,et al.  Shape and arrangement of columns in cat's striate cortex , 1963, The Journal of physiology.

[21]  M. Mezard,et al.  Exact mean-field inference in asymmetric kinetic Ising systems , 2011, 1103.3433.

[22]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[23]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[24]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[25]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[26]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[29]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[30]  Yasser Roudi,et al.  Learning and inference in a nonequilibrium Ising model with hidden nodes. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Alessandro Treves Are spin-glass effects relevant to understanding realistic auto-associative networks? , 1991 .

[32]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[33]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[34]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[36]  J. Hertz,et al.  Mean field theory for nonequilibrium network reconstruction. , 2010, Physical review letters.

[37]  Leonid Sigal,et al.  Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines , 2011, NIPS.

[38]  John Hertz,et al.  Network inference with hidden units. , 2013, Mathematical biosciences and engineering : MBE.

[39]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[40]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[41]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[42]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  D. Ringach Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. , 2002, Journal of neurophysiology.

[44]  John Hertz,et al.  Belief-Propagation and replicas for inference and learning in a kinetic Ising model with hidden spins , 2014, 1412.1727.

[45]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Alessandro Treves,et al.  Localized activity profiles and storage capacity of rate-based autoassociative networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[48]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[49]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[50]  Oren Shriki,et al.  Rate Models for Conductance-Based Cortical Neuronal Networks , 2003, Neural Computation.

[51]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[52]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[53]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[54]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[55]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[56]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[58]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[59]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[60]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[61]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[62]  Toshiyuki TANAKA Mean-field theory of Boltzmann machine learning , 1998 .

[63]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[64]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[65]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[66]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[67]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[68]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[69]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[70]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[71]  David J. Fleet,et al.  Dynamical binary latent variable models for 3D human pose tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[72]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[73]  Erik Aurell,et al.  Network inference using asynchronously updated kinetic Ising model. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[74]  Treves,et al.  Graded-response neurons and information encodings in autoassociative memories. , 1990, Physical review. A, Atomic, molecular, and optical physics.

[75]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[76]  C. Galland The limitations of deterministic Boltzmann machine learning , 1993 .

[77]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[78]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[79]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[80]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[81]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[82]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[83]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[84]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Razvan Pascanu,et al.  Contextual tag inference , 2011, TOMCCAP.