Unsupervised deep learning identifies semantic disentanglement in single inferotemporal neurons

Deep supervised neural networks trained to classify objects have emerged as popular models of computation in the primate ventral stream. These models represent information with a high-dimensional distributed population code, implying that inferotemporal (IT) responses are also too complex to interpret at the single-neuron level. We challenge this view by modelling neural responses to faces in the macaque IT with a deep unsupervised generative model, beta-VAE. Unlike deep classifiers, beta-VAE "disentangles" sensory data into interpretable latent factors, such as gender or hair length. We found a remarkable correspondence between the generative factors discovered by the model and those coded by single IT neurons. Moreover, we were able to reconstruct face images using the signals from just a handful of cells. This suggests that the ventral visual stream may be optimising the disentangling objective, producing a neural code that is low-dimensional and semantically interpretable at the single-unit level.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2014, ICLR.

[2]  Lili Mou,et al.  Stochastic , 2019, Proceedings of the 2019 Conference of the North.

[3]  Marcel van Gerven,et al.  Reconstructing perceived faces from brain activations with deep adversarial neural decoding , 2017, NIPS.

[4]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[5]  R. VanRullen,et al.  Reconstructing faces from fMRI patterns using deep generative neural networks. , 2019 .

[6]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[7]  H. Eichenbaum Barlow versus Hebb: When is it time to abandon the notion of feature detectors and adopt the cell assembly as the unit of cognition? , 2017, Neuroscience Letters.

[8]  Yoshua Bengio,et al.  How can deep learning advance computational modeling of sensory information processing? , 2018, ArXiv.

[9]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[10]  N. Kanwisher,et al.  How face perception unfolds over time , 2019, Nature Communications.

[11]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[12]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[13]  Matthias Bethge,et al.  Deep convolutional models improve predictions of macaque V1 responses to natural images , 2019, PLoS computational biology.

[14]  Anil K. Jain,et al.  Suspect identification based on descriptive facial attributes , 2014, IEEE International Joint Conference on Biometrics.

[15]  Thomas Vetter,et al.  Explaining face representation in the primate brain using different computational models , 2021, Current Biology.

[16]  H B Barlow,et al.  Single units and sensation: a neuron doctrine for perceptual psychology? , 1972, Perception.

[17]  David H. Bailey,et al.  Algorithms and applications , 1988 .

[18]  Wen Gao,et al.  The CAS-PEAL Large-Scale Chinese Face Database and Baseline Evaluations , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[19]  Doris Y. Tsao,et al.  The Code for Facial Identity in the Primate Brain , 2017, Cell.

[20]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[23]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[24]  David Pfau,et al.  Towards a Definition of Disentangled Representations , 2018, ArXiv.

[25]  Shreya Saxena,et al.  Towards the neural population doctrine , 2019, Current Opinion in Neurobiology.

[26]  R. Vogels,et al.  Inferotemporal neurons represent low-dimensional configurations of parameterized shapes , 2001, Nature Neuroscience.

[27]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[28]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2015 .

[29]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[30]  J. Skilling,et al.  Algorithms and Applications , 1985 .

[31]  Justin N. Wood,et al.  The Development of Invariant Object Recognition Requires Visual Experience With Temporally Smooth Objects , 2018, Cogn. Sci..

[32]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[33]  R. Yuste From the neuron doctrine to neural networks , 2015, Nature Reviews Neuroscience.

[34]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[35]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[36]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[37]  Murray Shanahan,et al.  SCAN: Learning Hierarchical Compositional Visual Concepts , 2018, ICLR.

[38]  Alexander Lerchner,et al.  A Heuristic for Unsupervised Model Selection for Variational Disentangled Representation Learning , 2020, ICLR.

[39]  Linda B. Smith,et al.  The Developing Infant Creates a Curriculum for Statistical Learning , 2018, Trends in Cognitive Sciences.

[40]  Michael C. Mozer,et al.  Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[41]  Leila Reddy,et al.  Reconstructing faces from fMRI patterns using deep generative neural networks , 2019, Communications Biology.

[42]  Seunghoon Hong,et al.  High-Fidelity Synthesis with Disentangled Representation , 2020, ECCV.

[43]  I. Biederman,et al.  Tuning for shape dimensions in macaque inferior temporal cortex , 2005, The European journal of neuroscience.

[44]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[45]  Máté Lengyel,et al.  Representational untangling by the firing rate nonlinearity in V1 simple cells , 2019, eLife.

[46]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[47]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[48]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[49]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Bernhard Egger,et al.  What computational model provides the best explanation of face representations in the primate brain? , 2020 .

[51]  Joshua Correll,et al.  The Chicago face database: A free stimulus set of faces and norming data , 2015, Behavior research methods.

[52]  Doris Y. Tsao,et al.  Mechanisms of face perception. , 2008, Annual review of neuroscience.

[53]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[55]  James J DiCarlo,et al.  Neural population control via deep image synthesis , 2019, Science.

[56]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR.

[57]  Scott P Johnson,et al.  Infants' statistical learning: 2- and 5-month-olds' segmentation of continuous visual sequences. , 2015, Journal of experimental child psychology.

[58]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[59]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[60]  Peter Gärdenfors,et al.  Navigating cognition: Spatial codes for human thinking , 2018, Science.

[61]  Tom Eccles,et al.  Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies , 2018, NeurIPS.

[62]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  H. Barlow Single Units and Sensation: A Neuron Doctrine for Perceptual Psychology? , 1972, Perception.

[64]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[65]  J. Munkres Algorithms for the Assignment and Transportation Problems , 1957 .

[66]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[67]  Michal Irani,et al.  Convergent evolution of face spaces across human face-selective neuronal groups and deep convolutional networks , 2019, Nature Communications.

[68]  Nikolaus Kriegeskorte,et al.  Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience , 2008, Frontiers in systems neuroscience.

[69]  Kurt Gray,et al.  The MR2: A multi-racial, mega-resolution database of facial stimuli , 2016, Behavior research methods.

[70]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[71]  D. Holdstock Past, present--and future? , 2005, Medicine, conflict, and survival.

[72]  Grace W. Lindsay Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future , 2020, Journal of Cognitive Neuroscience.