Comparing visual representations across human fMRI and computational vision.

Feedforward visual object perception recruits a cortical network that is assumed to be hierarchical, progressing from basic visual features to complete object representations. However, the nature of the intermediate features related to this transformation remains poorly understood. Here, we explore how well different computer vision recognition models account for neural object encoding across the human cortical visual pathway as measured using fMRI. These neural data, collected during the viewing of 60 images of real-world objects, were analyzed with a searchlight procedure as in Kriegeskorte, Goebel, and Bandettini (2006): Within each searchlight sphere, the obtained patterns of neural activity for all 60 objects were compared to model responses for each computer recognition algorithm using representational dissimilarity analysis (Kriegeskorte et al., 2008). Although each of the computer vision methods significantly accounted for some of the neural data, among the different models, the scale invariant feature transform (Lowe, 2004), encoding local visual properties gathered from "interest points," was best able to accurately and consistently account for stimulus representations within the ventral pathway. More generally, when present, significance was observed in regions of the ventral-temporal cortex associated with intermediate-level object perception. Differences in model effectiveness and the neural location of significant matches may be attributable to the fact that each model implements a different featural basis for representing objects (e.g., more holistic or more parts-based). Overall, we conclude that well-known computer vision recognition systems may serve as viable proxies for theories of intermediate visual object representation.

[1]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[2]  J. G. Snodgrass,et al.  A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. , 1980, Journal of experimental psychology. Human learning and memory.

[3]  A. J. Mistlin,et al.  Neurones responsive to faces in the temporal cortex: studies of functional organization, sensitivity to identity and relation to perception. , 1984, Human neurobiology.

[4]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[5]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[7]  D G Pelli,et al.  Pixel independence: measuring spatial interactions on a CRT display. , 1997, Spatial vision.

[8]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[9]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[10]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[11]  Russell A. Epstein,et al.  The Parahippocampal Place Area Recognition, Navigation, or Encoding? , 1999, Neuron.

[12]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[13]  Leslie G. Ungerleider,et al.  The Effect of Face Inversion on Activity in Human Neural Systems for Face and Object Perception , 1999, Neuron.

[14]  R. Vogels,et al.  Effect of image scrambling on inferior temporal cortical responses. , 1999, Neuroreport.

[15]  J. Haxby,et al.  The distributed human neural system for face perception , 2000, Trends in Cognitive Sciences.

[16]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[17]  Bosco S. Tjan,et al.  Adaptive Object Representation with Hierarchically-Distributed Memory Sites , 2000, NIPS.

[18]  N. Kanwisher,et al.  Cortical Regions Involved in Perceiving Object Shape , 2000, The Journal of Neuroscience.

[19]  Refractor Vision , 2000, The Lancet.

[20]  N. Kanwisher,et al.  The lateral occipital complex and its role in object recognition , 2001, Vision Research.

[21]  P. Schyns,et al.  Show Me the Features! Understanding Recognition From the Use of Visual Information , 2002, Psychological science.

[22]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[23]  Thomas E. Nichols,et al.  Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate , 2002, NeuroImage.

[24]  Keiji Tanaka Columns for complex visual object features in the inferotemporal cortex: clustering of cells with similar but slightly different stimulus selectivities. , 2003, Cerebral cortex.

[25]  N. Kanwisher,et al.  The fusiform face area subserves face perception, not generic within-category identification , 2004, Nature Neuroscience.

[26]  Ali Shokoufandeh,et al.  Shock Graphs and Shape Matching , 1998, International Journal of Computer Vision.

[27]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[28]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[29]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  A. Young,et al.  Understanding the recognition of facial identity and facial expression , 2005, Nature Reviews Neuroscience.

[31]  Alice J. O'Toole,et al.  Partially Distributed Representations of Objects and Faces in Ventral Temporal Cortex , 2005, Journal of Cognitive Neuroscience.

[32]  Garrison W. Cottrell,et al.  Holistic Processing Develops Because it is Good , 2005 .

[33]  Benjamin B. Kimia,et al.  Shapes, shocks, and deformations I: The components of two-dimensional shape and the reaction-diffusion space , 1995, International Journal of Computer Vision.

[34]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[36]  Rainer Goebel,et al.  Information-based functional brain mapping. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Shimon Ullman,et al.  Mutual information of image fragments predicts categorization in humans: Electrophysiological and behavioral evidence , 2007, Vision Research.

[38]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[39]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[40]  Chun-Chia Kung,et al.  Is Region-of-Interest Overlap Comparison a Reliable Measure of Category Specificity? , 2007, Journal of Cognitive Neuroscience.

[41]  Keiji Tanaka,et al.  Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. , 2007, Journal of neurophysiology.

[42]  Eric T. Carlson,et al.  A neural code for three-dimensional object shape in macaque inferotemporal cortex , 2008, Nature Neuroscience.

[43]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[44]  Doris Y. Tsao,et al.  Mechanisms of face perception. , 2008, Annual review of neuroscience.

[45]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[46]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Doris Y. Tsao,et al.  A face feature space in the macaque temporal lobe , 2009, Nature Neuroscience.

[48]  John A. Pyles,et al.  Neural adaptation for novel objects during dynamic articulation , 2009, Neuropsychologia.

[49]  I. Gauthier,et al.  Beyond Shape: How You Learn about Objects Affects How They Are Represented in Visual Cortex , 2009, PloS one.

[50]  J. Schultz,et al.  Natural facial motion enhances cortical responses to faces , 2009, Experimental Brain Research.

[51]  Tom Michael Mitchell,et al.  A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes , 2010, PloS one.

[52]  Sam S. Tsai,et al.  Survey of SIFT Compression Schemes , 2010 .

[53]  Jascha D. Swisher,et al.  Multiscale Pattern Analysis of Orientation-Selective Activity in the Primary Visual Cortex , 2010, The Journal of Neuroscience.

[54]  Dwight J. Kravitz,et al.  Real-World Scene Representations in High-Level Visual Cortex: It's the Spaces More Than the Places , 2011, The Journal of Neuroscience.

[55]  Eric T. Carlson,et al.  Medial Axis Shape Coding in Macaque Inferotemporal Cortex , 2012, Neuron.

[56]  Riitta Salmelin,et al.  Tracking neural coding of perceptual and semantic features of concrete nouns , 2012, NeuroImage.

[57]  John A. Pyles,et al.  Exploring computational models of visual object perception , 2012 .