A neural network trained to predict future video frames mimics critical properties of biological neuronal responses and perception

While deep neural networks take loose inspiration from neuroscience, it is an open question how seriously to take the analogies between artificial deep networks and biological neuronal systems. Interestingly, recent work has shown that deep convolutional neural networks (CNNs) trained on large-scale image recognition tasks can serve as strikingly good models for predicting the responses of neurons in visual cortex to visual stimuli, suggesting that analogies between artificial and biological neural networks may be more than superficial. However, while CNNs capture key properties of the average responses of cortical neurons, they fail to explain other properties of these neurons. For one, CNNs typically require large quantities of labeled input data for training. Our own brains, in contrast, rarely have access to this kind of supervision, so to the extent that representations are similar between CNNs and brains, this similarity must arise via different training paths. In addition, neurons in visual cortex produce complex time-varying responses even to static inputs, and they dynamically tune themselves to temporal regularities in the visual environment. We argue that these differences are clues to fundamental differences between the computations performed in the brain and in deep networks. To begin to close the gap, here we study the emergent properties of a previously-described recurrent generative network that is trained to predict future video frames in a self-supervised manner. Remarkably, the model is able to capture a wide variety of seemingly disparate phenomena observed in visual cortex, ranging from single unit response dynamics to complex perceptual motion illusions. These results suggest potentially deep connections between recurrent predictive neural network models and the brain, providing new leads that can enrich both fields.

[1]  David Mumford,et al.  On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[4]  S. Laughlin,et al.  Predictive coding: a fresh view of inhibition in the retina , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[5]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[6]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[7]  Roy D. Patterson,et al.  Predictive Coding and Pitch Processing in the Auditory Cortex , 2011, Journal of Cognitive Neuroscience.

[8]  G. Rhodes,et al.  Adaptive norm-based coding of facial identity , 2006, Vision Research.

[9]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[10]  Gabriel Kreiman,et al.  Unsupervised Learning of Visual Structure using Predictive Generative Networks , 2015, ArXiv.

[11]  Antonio Torralba,et al.  Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  William R. Softky,et al.  Unsupervised Pixel-prediction , 1995, NIPS.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  G. Kanizsa,et al.  Organization in Vision: Essays on Gestalt Perception , 1979 .

[15]  Guillaume S. Masson,et al.  The Flash-Lag Effect as a Motion-Based Predictive Shift , 2017, PLoS Comput. Biol..

[16]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[17]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[18]  Ruben Villegas,et al.  Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.

[19]  Eugenio Culurciello,et al.  Deep Predictive Coding Network for Object Recognition , 2018, ICML.

[20]  C. Olson,et al.  Statistical learning of visual transitions in monkey inferotemporal cortex , 2011, Proceedings of the National Academy of Sciences.

[21]  Michael W. Spratling Unsupervised Learning of Generative and Discriminative Weights Encoding Elementary Image Components in a Predictive Coding Model of Cortical Function , 2012, Neural Computation.

[22]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[23]  Rajesh P. N. Rao,et al.  Predictive Sequence Learning in Recurrent Neocortical Circuits , 1999, NIPS.

[24]  Randall C. O'Reilly,et al.  Learning Through Time in the Thalamocortical Loops , 2014, 1407.3432.

[25]  M. Giese,et al.  Norm-based face encoding by single neurons in the monkey inferotemporal cortex , 2006, Nature.

[26]  T. S. Lee,et al.  Dynamics of subjective contour formation in the early visual cortex. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Dale Purves,et al.  An empirical explanation of the flash-lag effect , 2008, Proceedings of the National Academy of Sciences.

[28]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[29]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[30]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[31]  José Carlos Príncipe,et al.  Deep Predictive Coding Networks , 2013, ICLR.

[32]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[33]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[34]  T J Sejnowski,et al.  Motion integration and postdiction in visual awareness. , 2000, Science.

[35]  Rasmus Berg Palm,et al.  Prediction as a candidate for learning deep hierarchical models of data , 2012 .

[36]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[37]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[38]  Richard T Born,et al.  Corticocortical Feedback Contributes to Surround Suppression in V1 of the Alert Primate , 2013, The Journal of Neuroscience.

[39]  A. Kitaoka,et al.  Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction , 2018, Front. Psychol..

[40]  D. Mackay Perceptual Stability of a Stroboscopically Lit Visual Field containing Self-Luminous Objects , 1958, Nature.

[41]  Romi Nijhawan,et al.  Motion extrapolation in catching , 1994, Nature.

[42]  Karl J. Friston,et al.  Cerebral hierarchies: predictive processing, precision and the pulvinar , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[43]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[44]  Joseph J Atick,et al.  Could information theory provide an ecological theory of sensory processing? , 2011, Network.

[45]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[46]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[47]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[48]  Jennifer A. Mangels,et al.  Predictive Codes for Forthcoming Perception in the Frontal Cortex , 2006, Science.

[49]  A. Leventhal,et al.  Signal timing across the macaque visual system. , 1998, Journal of neurophysiology.

[50]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Christina Zelano,et al.  Olfactory Predictive Codes and Stimulus Templates in Piriform Cortex , 2011, Neuron.