Visual word representation in the brain

The human visual system is thought to use features of intermediate complexity for scene representation. How the brain computationally represents intermediate features is unclear, however. To study this, we tested the Bag of Words (BoW) model in computer vision against human brain activity. This computational model uses visual word histograms, candidate features of intermediate complexity, to represent visual scenes, and has proven effective in automatic object and scene recognition. We analyzed where in the brain and to what extent human fMRI responses to natural scenes can be accounted for by BoW representations. Voxel-wise application of a distance-based variation partitioning method reveals that BoW representations explain brain activity in visual areas V1, V2 and in particular V4. Area V4 is known to be tuned for features of intermediate complexity, suggesting that the BoW model captures intermediate-level scene representations in the human brain.

[1]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[2]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[3]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[4]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[5]  D. Braun,et al.  Phase noise and the classification of natural images , 2006, Vision Research.

[6]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[7]  Michael S. Lewicki,et al.  Emergence of complex cell properties by learning to generalize in natural scenes , 2009, Nature.

[8]  A. Oliva,et al.  Diagnostic Colors Mediate Scene Recognition , 2000, Cognitive Psychology.

[9]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Lester C. Loschky,et al.  Localized information is necessary for scene categorization, including the Natural/Man-made distinction. , 2008, Journal of vision.

[11]  Pierre Legendre,et al.  DISTANCE‐BASED REDUNDANCY ANALYSIS: TESTING MULTISPECIES RESPONSES IN MULTIFACTORIAL ECOLOGICAL EXPERIMENTS , 1999 .

[12]  Stefan Treue,et al.  Adaptation to statistical properties of visual scenes biases rapid categorization , 2007 .

[13]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[14]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  P. Legendre,et al.  Variation partitioning of species data matrices: estimation and comparison of fractions. , 2006, Ecology.

[17]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[18]  A. Oliva,et al.  From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition , 1994 .

[19]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Sennay Ghebreab,et al.  From Image Statistics to Scene Gist: Evoked Neural Activity Reveals Transition from Low-Level Natural Image Structure to Scene Category , 2013, The Journal of Neuroscience.

[21]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[22]  Victor A. F. Lamme,et al.  Low-level contrast statistics are diagnostic of invariance of natural textures , 2012, Front. Comput. Neurosci..

[23]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[24]  C. Lawrence Zitnick,et al.  The role of features, algorithms and data in visual recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.