A Unified Theory Of Early Visual Representations From Retina To Cortex Through Anatomically Constrained Deep CNNs

The vertebrate visual system is hierarchically organized to process visual information in successive stages. Neural representations vary drastically across the first stages of visual processing: at the output of the retina, ganglion cell receptive fields (RFs) exhibit a clear antagonistic center-surround structure, whereas in the primary visual cortex (V1), typical RFs are sharply tuned to a precise orientation. There is currently no unified theory explaining these differences in representations across layers. Here, using a deep convolutional neural network trained on image recognition as a model of the visual system, we show that such differences in representation can emerge as a direct consequence of different neural resource constraints on the retinal and cortical networks, and for the first time we find a single model from which both geometries spontaneously emerge at the appropriate stages of visual processing. The key constraint is a reduced number of neurons at the retinal output, consistent with the anatomy of the optic nerve as a stringent bottleneck. Second, we find that, for simple downstream cortical networks, visual representations at the retinal output emerge as nonlinear and lossy feature detectors, whereas they emerge as linear and faithful encoders of the visual scene for more complex cortical networks. This result predicts that the retinas of small vertebrates (e.g. salamander, frog) should perform sophisticated nonlinear computations, extracting features directly relevant to behavior, whereas retinas of large animals such as primates should mostly encode the visual scene linearly and respond to a much broader range of stimuli. These predictions could reconcile the two seemingly incompatible views of the retina as either performing feature extraction or efficient coding of natural scenes, by suggesting that all vertebrates lie on a spectrum between these two objectives, depending on the degree of neural resources allocated to their visual system.

[1]  Eero P. Simoncelli,et al.  Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons , 2011, NIPS.

[2]  O. Marre,et al.  Toward a unified theory of efficient, predictive, and sparse coding , 2017, Proceedings of the National Academy of Sciences.

[3]  Pierre Yger,et al.  Multiplexed computations in retinal ganglion cells of a single type , 2016, Nature Communications.

[4]  Ben Calderhead,et al.  Advances in Neural Information Processing Systems 29 , 2016 .

[5]  Daniel D. Lee,et al.  Classification and Geometry of General Perceptual Manifolds , 2017, Physical Review X.

[6]  Joseph J. Atick,et al.  Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[7]  Andrew J King,et al.  Sensory cortex is optimized for prediction of future input , 2017, bioRxiv.

[8]  R. Baddeley,et al.  Is the early visual system optimised to be energy efficient? , 2005, Network.

[9]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[10]  Leon A. Gatys,et al.  Deep convolutional models improve predictions of macaque V1 responses to natural images , 2019, PLoS Comput. Biol..

[11]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[12]  Tim Gollisch,et al.  Eye Smarter than Scientists Believed: Neural Computations in Circuits of the Retina , 2010, Neuron.

[13]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[14]  Thomas Serre,et al.  How Deep is the Feature Analysis underlying Rapid Visual Categorization? , 2016, NIPS.

[15]  Surya Ganguli,et al.  Inferring hidden structure in multilayered neural circuits , 2017, bioRxiv.

[16]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[17]  D. Hubel Eye, brain, and vision , 1988 .

[18]  Olivier Marre,et al.  Relevant sparse codes with variational information bottleneck , 2016, NIPS.

[19]  W. Pitts,et al.  What the Frog's Eye Tells the Frog's Brain , 1959, Proceedings of the IRE.

[20]  Leon A. Gatys,et al.  Deep convolutional models improve predictions of macaque V1 responses to natural images , 2017, bioRxiv.

[21]  Eero P. Simoncelli,et al.  Spike-triggered neural characterization. , 2006, Journal of vision.

[22]  E J Chichilnisky,et al.  A simple white noise analysis of neuronal light responses , 2001, Network.

[23]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[24]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[25]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[26]  Surya Ganguli,et al.  Deep Learning Models of the Retinal Response to Natural Scenes , 2017, NIPS.

[27]  L. Chalupa,et al.  The new visual neurosciences , 2014 .

[28]  Eero P. Simoncelli,et al.  Efficient Coding of Spatial Information in the Primate Retina , 2012, The Journal of Neuroscience.

[29]  D. Dacey,et al.  Y-Cell Receptive Field and Collicular Projection of Parasol Ganglion Cells in Macaque Monkey Retina , 2008, The Journal of Neuroscience.

[30]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Daniel D. Lee,et al.  Learning Data Manifolds with a Cutting Plane Method , 2017, Neural Computation.

[33]  J. Sanes,et al.  The most numerous ganglion cell type of the mouse retina is a selective feature detector , 2012, Proceedings of the National Academy of Sciences.

[34]  D. Dacey,et al.  Origins of perception : retinal ganglion cell diversity and the creation of parallel visual pathways , 2011 .

[35]  D. G. Albrecht,et al.  Cortical neurons: Isolation of contrast gain control , 1992, Vision Research.

[36]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[37]  Roland J. Baddeley,et al.  Synaptic energy efficiency in retinal processing , 2003, Vision Research.

[38]  Surya Ganguli,et al.  The emergence of multiple retinal cell types through efficient coding of natural movies , 2018, bioRxiv.

[39]  D. Q. Nykamp,et al.  Computing linear approximations to nonlinear neuronal response , 2008, Network.

[40]  R. Masland The fundamental plan of the retina , 2001, Nature Neuroscience.

[41]  Daniel Kerschensteiner,et al.  A Pixel-Encoder Retinal Ganglion Cell with Spatially Offset Excitatory and Inhibitory Receptive Fields , 2018, Cell reports.