Modeling human vision using feedforward neural networks

In this thesis, we discuss the implementation, characterization, and evaluation of a new computational model for human vision. Our goal is to understand the mechanisms enabling invariant perception under scaling, translation, and clutter. The model is based on I-Theory [50], and uses convolutional neural networks. We investigate the explanatory power of this approach using the task of object recognition. We find that the model has important similarities with neural architectures and that it can reproduce human perceptual phenomena. This work may be an early step towards a more general and unified human vision model. Thesis Supervisor: Tomaso Poggio Title: Eugene McDermott Professor, BCS and CSAIL

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[3]  D. Levi,et al.  Visual crowding: a fundamental limit on conscious perception and object recognition , 2011, Trends in Cognitive Sciences.

[4]  P. Bex,et al.  A Unifying Model of Orientation Crowding in Peripheral Vision , 2015, Current Biology.

[5]  T. Poggio,et al.  Neural mechanisms of object recognition , 2002, Current Opinion in Neurobiology.

[6]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[7]  P. Lennie Receptive fields , 2003, Current Biology.

[8]  D. Levi,et al.  The effect of similarity and duration on spatial interaction in peripheral vision. , 1994, Spatial vision.

[9]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[10]  J Y Lettvin,et al.  Enhancing the Perception of Form in Peripheral Vision , 1986, Perception.

[11]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[12]  J. Lettvin,et al.  Task-determined strategies of visual process. , 1992, Brain research. Cognitive brain research.

[13]  R. Rosenholtz,et al.  Pooling of continuous features provides a unifying account of crowding , 2016, Journal of vision.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[16]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[17]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[18]  Anirvan S. Nandy,et al.  Saccade-confounded image statistics explain visual crowding , 2012, Nature Neuroscience.

[19]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[20]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[21]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22]  J. Grainger,et al.  Crowding affects letters and symbols differently. , 2010, Journal of experimental psychology. Human perception and performance.

[23]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[24]  Thomas Serre,et al.  Categorization by Learning and Combining Object Parts , 2001, NIPS.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  H. BOUMA,et al.  Interaction Effects in Parafoveal Letter Recognition , 1970, Nature.

[27]  M. Herzog,et al.  Crowding, grouping, and object recognition: A matter of appearance. , 2015, Journal of vision.

[28]  Thomas Serre,et al.  How Deep is the Feature Analysis underlying Rapid Visual Categorization? , 2016, NIPS.

[29]  D. Marr,et al.  Smallest channel in early human vision. , 1980, Journal of the Optical Society of America.

[30]  J. O'Regan,et al.  Some results on translation invariance in the human visual system. , 1990, Spatial vision.

[31]  Gerald Jay Sussman,et al.  Building Robust Systems an essay , 2007 .

[32]  M. Herzog,et al.  When crowding of crowding leads to uncrowding. , 2013, Journal of vision.

[33]  Jos B. T. M. Roerdink,et al.  A Neurophysiologically Plausible Population Code Model for Feature Integration Explains Visual Crowding , 2010, PLoS Comput. Biol..

[34]  C. Gross,et al.  Visuotopic organization and extent of V3 and V4 of the macaque , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[35]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[36]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[37]  D. Pelli,et al.  The uncrowded window of object recognition , 2008, Nature Neuroscience.

[38]  D. Pelli Crowding: a cortical constraint on object recognition , 2008, Current Opinion in Neurobiology.

[39]  Saman A. Zonouz,et al.  CloudID: Trustworthy cloud-based and cross-enterprise biometric identification , 2015, Expert Syst. Appl..

[40]  P. Cavanagh,et al.  Attentional resolution and the locus of visual awareness , 1996, Nature.

[41]  J. Maunsell,et al.  Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. , 2003, Journal of neurophysiology.

[42]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  I. Rentschler,et al.  Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[44]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[45]  S M Anstis,et al.  Letter: A chart demonstrating variations in acuity with retinal position. , 1974, Vision research.

[46]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[47]  Tomaso A. Poggio,et al.  Computational role of eccentricity dependent cortical magnification , 2014, ArXiv.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  C. Furmanski,et al.  Perceptual learning in object recognition: object specificity and size invariance , 2000, Vision Research.

[50]  G. Kreiman,et al.  Timing, Timing, Timing: Fast Decoding of Object Information from Intracranial Field Potentials in Human Visual Cortex , 2009, Neuron.

[51]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[52]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[53]  D. Pelli,et al.  Crowding is unlike ordinary masking: distinguishing feature integration from detection. , 2004, Journal of vision.

[54]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[55]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[56]  Leslie G. Ungerleider,et al.  The modular organization of projections from areas V1 and V2 to areas V4 and TEO in macaques , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[57]  C. Gross,et al.  Visual topography of V2 in the macaque , 1981, The Journal of comparative neurology.

[58]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  S. Edelman,et al.  Imperfect Invariance to Object Translation in the Discrimination of Complex Shapes , 2001, Perception.

[60]  Tomaso Poggio,et al.  A hierarchical model of peripheral vision , 2011 .