Hierarchical temporal prediction captures motion processing from retina to higher visual cortex

Visual neurons respond selectively to features that become increasingly complex in their form and dynamics from the eyes to the cortex. These features take specific forms: retinal neurons prefer localized flashing dots1, primary visual cortical (V1) neurons moving bars2–4, and those in higher cortical areas, such as middle temporal (MT) cortex, favor complex features like moving textures5–7. Whether there are general principles behind this diverse complexity of response properties in the visual system has been an area of intense investigation. To date, no single normative model has been able to account for the hierarchy of tuning to dynamic inputs along the visual pathway. Here we show that hierarchical temporal prediction - representing features that efficiently predict future sensory input from past sensory input8–11 - can explain how neuronal tuning properties, particularly those relating to motion, change from retina to higher visual cortex. In contrast to some other approaches12–16, the temporal prediction framework learns to represent features of unlabeled and dynamic stimuli, an essential requirement of the real brain. This suggests that the brain may not have evolved to efficiently represent all incoming stimuli, as implied by some leading theories. Instead, the selective representation of sensory features that help in predicting the future may be a general coding principle for extracting temporally-structured features that depend on increasingly high-level statistics of the visual input.

[1]  O. Marre,et al.  Toward a unified theory of efficient, predictive, and sparse coding , 2017, Proceedings of the National Academy of Sciences.

[2]  Stephanie E. Palmer,et al.  Optimal Prediction in the Retina and Natural Motion Statistics , 2016 .

[3]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[4]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[5]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[6]  Andrew J King,et al.  Sensory cortex is optimized for prediction of future input , 2017, bioRxiv.

[7]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[8]  Charles G. Gross,et al.  Pattern recognition mechanisms , 1985 .

[9]  Konrad P. Körding,et al.  Extracting Slow Subspaces from Natural Videos Leads to Complex Cells , 2001, ICANN.

[10]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[11]  Marcin J. Skwark,et al.  Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[12]  R. Shapley,et al.  Cat and monkey retinal ganglion cells and their visual functional roles , 1986, Trends in Neurosciences.

[13]  Laurenz Wiskott,et al.  Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.

[14]  D H HUBEL,et al.  RECEPTIVE FIELDS AND FUNCTIONAL ARCHITECTURE IN TWO NONSTRIATE VISUAL AREAS (18 AND 19) OF THE CAT. , 1965, Journal of neurophysiology.

[15]  Matthias Bethge,et al.  Slowness and Sparseness Have Diverging Effects on Complex Cell Learning , 2014, PLoS Comput. Biol..

[16]  E. Adelson,et al.  The analysis of moving visual patterns , 1985 .

[17]  A. B. Bonds,et al.  Classifying simple and complex cells on the basis of response modulation , 1991, Vision Research.

[18]  Margaret S Livingstone,et al.  End-Stopping and the Aperture Problem Two-Dimensional Motion Signals in Macaque V1 , 2003, Neuron.

[19]  Richard E. Turner,et al.  A Structured Model of Video Reproduces Primary Visual Cortical Organisation , 2009, PLoS Comput. Biol..

[20]  Simon Osindero,et al.  Modelling the Statistics of Natural Images with Topographic Product of Student-t Models , 2004 .

[21]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[22]  Konrad P. Körding,et al.  Learning the Nonlinearity of Neurons from Natural Visual Stimuli , 2003, Neural Computation.

[23]  Ralph D Freeman,et al.  Direction selectivity of neurons in the visual cortex is non‐linear and lamina‐dependent , 2016, The European journal of neuroscience.

[24]  E J Chichilnisky,et al.  A simple white noise analysis of neuronal light responses , 2001, Network.

[25]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[26]  Konrad Paul Kording,et al.  How are complex cell properties adapted to the statistics of natural stimuli? , 2004, Journal of neurophysiology.

[27]  J. Movshon,et al.  Spatial summation in the receptive fields of simple cells in the cat's striate cortex. , 1978, The Journal of physiology.

[28]  Steven C Dakin,et al.  An oblique effect for local motion: psychophysics and natural movie statistics. , 2005, Journal of vision.

[29]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[30]  Eero P. Simoncelli,et al.  How MT cells analyze the motion of visual patterns , 2006, Nature Neuroscience.

[31]  Surya Ganguli,et al.  The emergence of multiple retinal cell types through efficient coding of natural movies , 2018, bioRxiv.

[32]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[33]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[34]  Eero P. Simoncelli,et al.  Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.

[35]  Bruno A. Olshausen,et al.  Sparse Coding Of Time-Varying Natural Images , 2010 .

[36]  S. Zeki Functional organization of a visual area in the posterior bank of the superior temporal sulcus of the rhesus monkey , 1974, The Journal of physiology.

[37]  Terrence J. Sejnowski,et al.  Multi-state Modeling of Biomolecules , 2014, PLoS Comput. Biol..

[38]  Eero P. Simoncelli,et al.  To appear in: The New Cognitive Neurosciences, 3rd edition Editor: M. Gazzaniga. MIT Press, 2004. Characterization of Neural Responses with Stochastic Stimuli , 2022 .

[39]  F. Jäkel,et al.  Spatial four-alternative forced-choice method is the preferred psychophysical method for naïve observers. , 2006, Journal of vision.

[40]  James C. R. Whittington,et al.  Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.

[41]  E. Bizzi,et al.  The Cognitive Neurosciences , 1996 .

[42]  Rajesh P. N. Rao,et al.  Predictive Coding , 2019, A Blueprint for the Hard Problem of Consciousness.

[43]  J. Movshon,et al.  Dynamics of motion signaling by neurons in macaque area MT , 2005, Nature Neuroscience.

[44]  Hassana K. Oyibo,et al.  Experience-dependent spatial expectations in mouse visual cortex , 2016, Nature Neuroscience.

[45]  Nicholas J. Priebe,et al.  Emergence of Orientation Selectivity in the Mammalian Visual Pathway , 2013, The Journal of Neuroscience.

[46]  Aapo Hyvärinen,et al.  A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images , 2001, Vision Research.

[47]  Bruno A. Olshausen,et al.  Learning Intermediate-Level Representations of Form and Motion from Natural Movies , 2012, Neural Computation.

[48]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[49]  Aapo Hyvärinen,et al.  Simple-Cell-Like Receptive Fields Maximize Temporal Coherence in Natural Video , 2003, Neural Computation.

[50]  Aapo Hyvärinen,et al.  Bubbles: a unifying framework for low-level statistical properties of natural image sequences. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[51]  R. Shapley,et al.  Orientation Selectivity in Macaque V1: Diversity and Laminar Dependence , 2002, The Journal of Neuroscience.

[52]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[53]  Michael S. Lewicki,et al.  Emergence of complex cell properties by learning to generalize in natural scenes , 2009, Nature.

[54]  S. Laughlin,et al.  Predictive coding: a fresh view of inhibition in the retina , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[55]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[56]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[57]  S. W. Kuffler Discharge patterns and functional organization of mammalian retina. , 1953, Journal of neurophysiology.

[58]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[59]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[60]  Felix Creutzig,et al.  Predictive Coding and the Slowness Principle: An Information-Theoretic Approach , 2008, Neural Computation.

[61]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[62]  H. Barlow Summation and inhibition in the frog's retina , 1953, The Journal of physiology.

[63]  Anthony J. Movshon,et al.  Visual Response Properties of Striate Cortical Neurons Projecting to Area MT in Macaque Monkeys , 1996, The Journal of Neuroscience.

[64]  J. Movshon,et al.  Receptive field organization of complex cells in the cat's striate cortex. , 1978, The Journal of physiology.

[65]  Bruno A. Olshausen,et al.  Learning sparse, overcomplete representations of time-varying natural images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[66]  M. Feller,et al.  Mechanisms underlying development of visual maps and receptive fields. , 2008, Annual review of neuroscience.

[67]  Anthony M. Norcia,et al.  Neural correlates of shape-from-shading , 2002 .

[68]  Lynne Kiorpes,et al.  Visual development in primates: Neural mechanisms and critical periods , 2015, Developmental neurobiology.