Prediction as a candidate for learning deep hierarchical models of data

Recent findings [HOT06] have made possible the learning of deep layered hierarchical representations of data mimicking the brains working. It is hoped that this paradigm will unlock some of the power of the brain and lead to advances towards true AI. In this thesis I implement and evaluate state-of-the-art deep learning models and using these as building blocks I investigate the hypothesis that predicting the time-to-time sensory input is a good learning objective. I introduce the Predictive Encoder (PE) and show that a simple non-regularized learning rule, minimizing prediction error on natural video patches leads to receptive fields similar to those found in Macaque monkey visual area V1. I scale this model to video of natural scenes by introducing the Convolutional Predictive Encoder (CPE) and show similar results. Both models can be used in deep architectures as a deep learning module.

[1]  Michael W. Spratling Unsupervised Learning of Generative and Discriminative Weights Encoding Elementary Image Components in a Predictive Coding Model of Cortical Function , 2012, Neural Computation.

[2]  J. Hahm,et al.  Cortically induced thalamic plasticity in the primate somatosensory system , 1998, Nature Neuroscience.

[3]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[4]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[5]  Jake Bouvrie,et al.  Notes on Convolutional Neural Networks , 2006 .

[6]  Karl J. Friston Learning and inference in the brain , 2003, Neural Networks.

[7]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[8]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[9]  Otto D. Creutzfeldt,et al.  Generality of the functional structure of the neocortex , 1977, Naturwissenschaften.

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[12]  M. Bar,et al.  Top-down predictions in the cognitive brain , 2007, Brain and Cognition.

[13]  V. Mountcastle The columnar organization of the neocortex. , 1997, Brain : a journal of neurology.

[14]  M. Bar,et al.  Cortical Mechanisms Specific to Explicit Visual Object Recognition , 2001, Neuron.

[15]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[16]  P A Salin,et al.  Corticocortical connections in the visual system: structure and function. , 1995, Physiological reviews.

[17]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[18]  Graham W. Taylor,et al.  Learning local spatio-temporal features for activity recognition , 2010 .

[19]  Geoffrey E. Hinton,et al.  Two Distributed-State Models For Generating High-Dimensional Time Series , 2011, J. Mach. Learn. Res..

[20]  Henk J. Sips,et al.  On the use of small 2d convolutions on GPUs , 2010, ISCA'10.

[21]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[22]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[23]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[24]  Geoffrey E. Hinton,et al.  Phone recognition using Restricted Boltzmann Machines , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[26]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[27]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[28]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[29]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Bo Chen,et al.  Deep Learning of Invariant Spatio-Temporal Features from Video , 2010 .

[32]  Michael W. Spratling Reconciling Predictive Coding and Biased Competition Models of Cortical Function , 2008, Frontiers Comput. Neurosci..

[33]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[35]  F. Qiu,et al.  Figure and Ground in the Visual Cortex: V2 Combines Stereoscopic Cues with Gestalt Rules , 2005, Neuron.

[36]  S. Laughlin,et al.  Predictive coding: a fresh view of inhibition in the retina , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[37]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[38]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[39]  Niko Wilbert,et al.  Slow feature analysis , 2011, Scholarpedia.

[40]  David Alais,et al.  Temporal whitening: transient noise perceptually equalizes the 1/f temporal amplitude spectrum. , 2009, Journal of vision.

[41]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[42]  C. Bundesen A theory of visual attention. , 1990, Psychological review.

[43]  L. Abbott,et al.  Competitive Hebbian learning through spike-timing-dependent synaptic plasticity , 2000, Nature Neuroscience.

[44]  Guo-An Chen,et al.  Acceleration of backpropagation learning using optimised learning rate and momentum , 1993 .

[45]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[46]  J. Hawkins,et al.  On Intelligence , 2004 .

[47]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[49]  Jacques de Villiers,et al.  Backpropagation neural nets with one and two hidden layers , 1993, IEEE Trans. Neural Networks.

[50]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[51]  Rajesh P. N. Rao,et al.  Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning , 2001, Neural Computation.

[52]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[53]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[54]  D. Buonomano,et al.  Cortical plasticity: from synapses to maps. , 1998, Annual review of neuroscience.

[55]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[56]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[57]  Laurenz Wiskott,et al.  Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.

[58]  S. Ullman,et al.  Retinotopic Axis Specificity and Selective Clustering of Feedback Projections from V2 to V1 in the Owl Monkey , 2005, The Journal of Neuroscience.

[59]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[60]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[61]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[62]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[63]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[64]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[65]  James V. Stone Learning Perceptually Salient Visual Parameters Using Spatiotemporal Smoothness Constraints , 1996, Neural Computation.

[66]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[67]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[68]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[69]  J. Bullier,et al.  Reaching beyond the classical receptive field of V1 neurons: horizontal or feedback axons? , 2003, Journal of Physiology-Paris.

[70]  D. Ringach Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. , 2002, Journal of neurophysiology.

[71]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[72]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[73]  Claus C. Hilgetag,et al.  Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala , 2007, NeuroImage.

[74]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.