Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. However, many neurons exhibit $\textit{mixed selectivity}$, i.e., they represent multiple unrelated features. A recent hypothesis proposes that features in deep networks may be represented in $\textit{superposition}$, i.e., on non-orthogonal axes by multiple neurons, since the number of possible interpretable features in natural data is generally larger than the number of neurons in a given network. Accordingly, we should be able to find meaningful directions in activation space that are not aligned with individual neurons. Here, we propose (1) an automated method for quantifying visual interpretability that is validated against a large database of human psychophysics judgments of neuron interpretability, and (2) an approach for finding meaningful directions in network activation space. We leverage these methods to discover directions in convolutional neural networks that are more intuitively meaningful than individual neurons, as we confirm and investigate in a series of analyses. Moreover, we apply the same method to three recent datasets of visual neural responses in the brain and find that our conclusions largely transfer to real neural data, suggesting that superposition might be deployed by the brain. This also provides a link with disentanglement and raises fundamental questions about robust, efficient and factorized representations in both artificial and biological neural systems.

[1]  R. Huben,et al.  Sparse Autoencoders Find Highly Interpretable Features in Language Models , 2023, ArXiv.

[2]  Roland S. Zimmermann,et al.  Scale Alone Does not Improve Mechanistic Interpretability in Vision Models , 2023, NeurIPS.

[3]  Alexander S. Ecker,et al.  Deep learning-driven characterization of single cell tuning in primate visual area V4 unveils topological organization , 2023, bioRxiv.

[4]  D. Bertsimas,et al.  Finding Neurons in a Haystack: Case Studies with Sparse Probing , 2023, Trans. Mach. Learn. Res..

[5]  Augustine N. Mavor-Parker,et al.  Towards Automated Circuit Discovery for Mechanistic Interpretability , 2023, ArXiv.

[6]  Eero P. Simoncelli,et al.  Catalyzing next-generation Artificial Intelligence through NeuroAI , 2023, Nature Communications.

[7]  Ilyes Khemakhem,et al.  Identifiability of latent-variable and structural-equation models: from linear to nonlinear , 2023, Annals of the Institute of Statistical Mathematics.

[8]  J. Steinhardt,et al.  Progress measures for grokking via mechanistic interpretability , 2023, ICLR.

[9]  M. Livingstone,et al.  The neural code for “face cells” is not face-specific , 2022, bioRxiv.

[10]  Zvi N. Roth,et al.  Representations in human primary visual cortex drift over time , 2022, bioRxiv.

[11]  Dario Amodei,et al.  Toy Models of Superposition , 2022, ArXiv.

[12]  C. Harvey,et al.  Representational drift: Emerging theories for continual learning and experimental future directions , 2022, Current Opinion in Neurobiology.

[13]  Jacob A. Zavatone-Veth,et al.  Drifting neuronal representations: Bug or feature? , 2022, Biological Cybernetics.

[14]  Emily J. Allen,et al.  A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence , 2021, Nature Neuroscience.

[15]  Matthias Bethge,et al.  How Well do Feature Visualizations Support Causal Understanding of CNN Activations? , 2021, NeurIPS.

[16]  B. Hayden,et al.  The population doctrine in cognitive neuroscience , 2021, Neuron.

[17]  Bruno A Olshausen,et al.  Selectivity and robustness of sparse coding networks , 2020, Journal of vision.

[18]  Ari S. Morcos,et al.  Towards falsifiable interpretability research , 2020, ArXiv.

[19]  Matthias Bethge,et al.  Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding , 2020, ICLR.

[20]  Demis Hassabis,et al.  Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons , 2020, Nature Communications.

[21]  Nick Cammarata,et al.  An Overview of Early Vision in InceptionV1 , 2020 .

[22]  Alexander S. Ecker,et al.  Rotation-invariant clustering of neuronal responses in primary visual cortex , 2020, ICLR.

[23]  David J. Schwab,et al.  Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs , 2020, ICLR.

[24]  Timothy O’Leary,et al.  Causes and consequences of representational drift , 2019, Current Opinion in Neurobiology.

[25]  Christopher P. Burgess,et al.  Unsupervised Model Selection for Variational Disentangled Representation Learning , 2019, ICLR.

[26]  Jason Yosinski,et al.  Understanding Neural Networks via Feature Visualization: A survey , 2019, Explainable AI.

[27]  David J. Freedman,et al.  Nonlinear mixed selectivity supports reliable neural computation , 2019, bioRxiv.

[28]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[29]  Nicholas A. Steinmetz,et al.  High-dimensional geometry of population responses in visual cortex , 2018, Nature.

[30]  Bruno A. Olshausen,et al.  The Sparse Manifold Transform , 2018, NeurIPS.

[31]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[32]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[33]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Alexander S. Ecker,et al.  Neural system identification for large populations separating "what" and "where" , 2017, NIPS.

[35]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[36]  Aapo Hyvärinen,et al.  Nonlinear ICA of Temporally Dependent Stationary Sources , 2017, AISTATS.

[37]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[38]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[39]  Aapo Hyvärinen,et al.  Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA , 2016, NIPS.

[40]  Stefano Fusi,et al.  Why neurons mix: high dimensionality for higher cognition , 2016, Current Opinion in Neurobiology.

[41]  Hod Lipson,et al.  Convergent Learning: Do different neural networks learn the same representations? , 2015, FE@NIPS.

[42]  Surya Ganguli,et al.  On simplicity and complexity in the brave new world of large-scale neuroscience , 2015, Current Opinion in Neurobiology.

[43]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[44]  Xiao-Jing Wang,et al.  The importance of mixed selectivity in complex cognitive tasks , 2013, Nature.

[45]  Stefano Fusi,et al.  The Sparseness of Mixed Selectivity Neurons Controls the Generalization–Discrimination Trade-Off , 2013, The Journal of Neuroscience.

[46]  G. Stanley Reading and writing the neural code , 2013, Nature Neuroscience.

[47]  Nikolaus Kriegeskorte,et al.  How does an fMRI voxel sample the neuronal activity pattern: Compact-kernel or complex spatiotemporal filter? , 2010, NeuroImage.

[48]  P. Latham,et al.  Ruling out and ruling in neural codes , 2009, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[50]  K. Mori,et al.  Odorant Category Profile Selectivity of Olfactory Cortex Neurons , 2007, The Journal of Neuroscience.

[51]  A. Pouget,et al.  Neural correlations, population coding and computation , 2006, Nature Reviews Neuroscience.

[52]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[53]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[54]  Terrence J Sejnowski,et al.  Communication in Neuronal Networks , 2003, Science.

[55]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[56]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[57]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[58]  Pirmin Stekeler-Weithofer Grundprobleme der Logik : Elemente einer Kritik der formalen Vernunft , 1986 .

[59]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[60]  H. Barlow,et al.  Single Units and Sensation: A Neuron Doctrine for Perceptual Psychology? , 1972, Perception.

[61]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[62]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[63]  H. Barlow Summation and inhibition in the frog's retina , 1953, The Journal of physiology.

[64]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[65]  A. Madry,et al.  Adversarially trained neural representations are already as robust as biological neural representations , 2022, ICML.

[66]  Matthias Bethge,et al.  Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization , 2021, ICLR.

[67]  Wojciech Samek,et al.  Explainable AI: Interpreting, Explaining and Visualizing Deep Learning , 2019, Explainable AI.

[68]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[69]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[70]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[71]  D. Wilson,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006 .

[72]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[73]  Stanley Finger,et al.  Origins of neuroscience: A history of explorations into brain function. , 1994 .

[74]  D. B. Bender,et al.  Visual properties of neurons in inferotemporal cortex of the Macaque. , 1972, Journal of neurophysiology.