The feature-weighted receptive field: an interpretable encoding model for complex feature spaces

ABSTRACT We introduce the feature‐weighted receptive field (fwRF), an encoding model designed to balance expressiveness, interpretability and scalability. The fwRF is organized around the notion of a feature map—a transformation of visual stimuli into visual features that preserves the topology of visual space (but not necessarily the native resolution of the stimulus). The key assumption of the fwRF model is that activity in each voxel encodes variation in a spatially localized region across multiple feature maps. This region is fixed for all feature maps; however, the contribution of each feature map to voxel activity is weighted. Thus, the model has two separable sets of parameters: “where” parameters that characterize the location and extent of pooling over visual features, and “what” parameters that characterize tuning to visual features. The “where” parameters are analogous to classical receptive fields, while “what” parameters are analogous to classical tuning functions. By treating these as separable parameters, the fwRF model complexity is independent of the resolution of the underlying feature maps. This makes it possible to estimate models with thousands of high‐resolution feature maps from relatively small amounts of data. Once a fwRF model has been estimated from data, spatial pooling and feature tuning can be read‐off directly with no (or very little) additional post‐processing or in‐silico experimentation. We describe an optimization algorithm for estimating fwRF models from data acquired during standard visual neuroimaging experiments. We then demonstrate the model's application to two distinct sets of features: Gabor wavelets and features supplied by a deep convolutional neural network. We show that when Gabor feature maps are used, the fwRF model recovers receptive fields and spatial frequency tuning functions consistent with known organizational principles of the visual cortex. We also show that a fwRF model can be used to regress entire deep convolutional networks against brain activity. The ability to use whole networks in a single encoding model yields state‐of‐the‐art prediction accuracy. Our results suggest a wide variety of uses for the feature‐weighted receptive field model, from retinotopic mapping with natural scenes, to regressing the activities of whole deep neural networks onto measured brain activity. HIGHLIGHTSWe introduce a new encoding model: the feature‐weighted receptive field (fwRF).A voxel activity encodes one visual field region across many feature maps.The fwRF model recovers voxel receptive field‐like properties and tuning functions.Our method allows us to regress whole deep neural networks on brain activity.We obtain state‐of‐the‐art prediction accuracy for voxels in the visual system.

[1]  Thomas Naselaris,et al.  Resolving Ambiguities of MVPA Using Explicit Models of Representation , 2015, Trends in Cognitive Sciences.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[4]  J. Gallant,et al.  Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies , 2011, Current Biology.

[5]  J. Gallant,et al.  Cortical representation of animate and inanimate objects in complex natural scenes , 2012, Journal of Physiology-Paris.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Michael Eickenberg,et al.  Seeing it all: Convolutional network layers map the function of the human visual system , 2017, NeuroImage.

[8]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[9]  Ryan J. Prenger,et al.  Bayesian Reconstruction of Natural Images from Human Brain Activity , 2009, Neuron.

[10]  Jean-Baptiste Poline,et al.  Inverse retinotopy: Inferring the visual content of images from brain activation patterns , 2006, NeuroImage.

[11]  Brian A. Wandell,et al.  Population receptive field estimates in human visual cortex , 2008, NeuroImage.

[12]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[13]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[14]  Jack L. Gallant,et al.  Natural Scene Statistics Account for the Representation of Scene Categories in Human Visual Cortex , 2013, Neuron.

[15]  Jack L. Gallant,et al.  Encoding and decoding in fMRI , 2011, NeuroImage.

[16]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.