Why Are Face and Object Processing Segregated in the Human Brain? Testing Computational Hypotheses with Deep Convolutional Neural Networks

Why does the human brain contain cortical regions specialized for the perception of some stimulus categories (e.g., faces), but not others (e.g., cars)? And why might functional specialization be a good design strategy for brains in the first place? Here, we used deep convolutional neural networks (CNNs) to test whether models optimized to recognize faces and objects require functional segregation for each task. First, we trained two separate CNNs with the same architecture to categorize either faces or objects. Unsurprisingly, the face-trained CNN performed worse on object categorization than the object-trained CNN and vice versa, demonstrating that the features optimized for each task differ from one another. Second, following the method of Kell et al (2018), we trained a family of dualtask CNNs on both tasks, asking how many layers can be shared before performance declines. Somewhat surprisingly, even the dual-task CNN that shared all layers performed nearly as well as the separate networks. This result is consistent with two hypotheses: 1) face and object recognition may be performed well by using a shared pool of common features or 2) the shared network has learned “hidden” functional specialization. In ongoing work, we are seeking to disambiguate these two hypotheses.

[1]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[2]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[3]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[4]  N. Kanwisher,et al.  Domain specificity in visual cortex. , 2006, Cerebral cortex.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[9]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[10]  Josh H McDermott,et al.  Deep neural network models of sensory systems: windows onto the role of task constraints , 2019, Current Opinion in Neurobiology.

[11]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[12]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[13]  N. Kanwisher,et al.  The Human Body , 2001 .