IN THE BRAIN : COMPARING HMAX AND BOW

The human visual system is thought to use features of intermediate complexity for scene representation. How the brain computationally represents intermediate features is, however, still unclear. Here we tested and compared two widely used computational models the biologically plausible HMAX model and Bag of Words (BoW) model from computer vision against human brain activity. These computational models use visual dictionaries, candidate features of intermediate complexity, to represent visual scenes, and the models have been proven effective in automatic object and scene recognition. We analyzed where in the brain and to what extent human fMRI responses to natural scenes can be accounted for by the HMAX and BoW representations. Voxel-wise application of a distance-based variation partitioning method reveals that HMAX explains significant brain activity in early visual regions and also in higher regions such as LO, TO while the BoW primarily explains brain acitvity in the early visual area. Notably, both HMAX and BoW explain the most brain activity in higher areas such as V4 and TO. These results suggest that visual dictionaries might provide a suitable computation for the representation of intermediate features in the brain.

[1]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[2]  C. Lawrence Zitnick,et al.  The role of features, algorithms and data in visual recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[4]  Michael S. Lewicki,et al.  Emergence of complex cell properties by learning to generalize in natural scenes , 2009, Nature.

[5]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[6]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[8]  P. Legendre,et al.  Variation partitioning of species data matrices: estimation and comparison of fractions. , 2006, Ecology.

[9]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[10]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[11]  Pierre Legendre,et al.  DISTANCE‐BASED REDUNDANCY ANALYSIS: TESTING MULTISPECIES RESPONSES IN MULTIFACTORIAL ECOLOGICAL EXPERIMENTS , 1999 .

[12]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[13]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.