Performance-optimized hierarchical models only partially predict neural responses during perceptual decision making

Models of perceptual decision making have historically been designed to maximally explain behaviour and brain activity independently of their ability to actually perform tasks. More recently, performance-optimized models have been shown to correlate with brain responses to images and thus present a complementary approach to understand perceptual processes. In the present study, we compare how these approaches comparatively account for the spatio-temporal organization of neural responses elicited by ambiguous visual stimuli. Forty-six healthy human subjects performed perceptual decisions on briefly flashed stimuli constructed from ambiguous characters. The stimuli were designed to have 7 orthogonal properties, ranging from low-sensory levels (e.g. spatial location of the stimulus) to conceptual (whether stimulus is a letter or a digit) and task levels (i.e. required hand movement). Magneto-encephalography source and decoding analyses revealed that these 7 levels of representations are sequentially encoded by the cortical hierarchy, and actively maintained until the subject responds. This hierarchy appeared poorly correlated to normative, drift-diffusion, and 5-layer convolutional neural networks (CNN) optimized to accurately categorize alpha-numeric characters, but partially matched the sequence of activations of 3/6 state-of-the-art CNNs trained for natural image labeling (VGG-16, VGG-19, MobileNet). Additionally, we identify several systematic discrepancies between these CNNs and brain activity, revealing the importance of single-trial learning and recurrent processing. Overall, our results strengthen the notion that performance-optimized algorithms can converge towards the computational solution implemented by the human visual system, and open possible avenues to improve artificial perceptual decision making.

[1]  Jeffrey N. Rouder,et al.  Modeling Response Times for Two-Choice Decisions , 1998 .

[2]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[3]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[4]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[5]  Jonathan Winawer,et al.  A Brain Area for Visual Numerals , 2013, The Journal of Neuroscience.

[6]  Amir Amedi,et al.  Origins of the specialization for letters and numbers in ventral occipitotemporal cortex , 2015, Trends in Cognitive Sciences.

[7]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[8]  Elizabeth Michael,et al.  Dissociable sources of uncertainty in perceptual decision making , 2016 .

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[11]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Michael Eickenberg,et al.  Seeing it all: Convolutional network layers map the function of the human visual system , 2017, NeuroImage.

[14]  J. Changeux,et al.  Experimental and Theoretical Approaches to Conscious Processing , 2011, Neuron.

[15]  Martin Luessi,et al.  MNE software for processing MEG and EEG data , 2014, NeuroImage.

[16]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[19]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[20]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[21]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[22]  S. Dehaene,et al.  Characterizing the dynamics of mental representations: the temporal generalization method , 2014, Trends in Cognitive Sciences.

[23]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[24]  James L. McClelland On the time relations of mental processes: An examination of systems of processes in cascade. , 1979 .

[25]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[26]  E. Halgren,et al.  Dynamic Statistical Parametric Mapping Combining fMRI and MEG for High-Resolution Imaging of Cortical Activity , 2000, Neuron.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).