论文信息 - Aligning Artificial Neural Networks to the Brain yields Shallow Recurrent Architectures

Aligning Artificial Neural Networks to the Brain yields Shallow Recurrent Architectures

Deep artificial neural networks with spatially repeated processing (a.k.a., deep convolutional ANNs) have been established as the best class of candidate models of visual processing in the primate ventral visual processing stream. Over the past five years, these ANNs have evolved from a simple feedforward eight-layer architecture in AlexNet to extremely deep and branching NASNet architectures, demonstrating increasingly better object categorization performance. Here we ask, as ANNs have continued to evolve in performance, are they also strong candidate models for the brain? To answer this question, we developed Brain-Score, a composite of neural and behavioral benchmarks that score any ANN on how brainlike it is, together with an online platform where ANNs can be submitted to receive a Brain-Score and their rank relative to other models. Deploying our framework on dozens of state-of-the-art ANNs, we found that ResNet and DenseNet families of models are the closest models from the Machine Learning community to primate ventral visual stream. Curiously, best current ImageNet models, such as PNASNet, were not the top-performing models on Brain-Score. Despite high scores, these deep models are often hard to map onto the brain’s anatomy due to their vast number of layers and missing biologically-important connections, such as recurrence. To further map onto anatomy and validate our approach, we built CORnet-S: a neural network developed by using Brain-Score as a guide with the anatomical constraints of compactness and recurrence. Although a shallow model with four anatomically mapped areas with recurrent connectivity, CORnet-S is a top model on Brain-Score and outperforms similarly compact models on ImageNet.

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[6] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[7] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[8] J. DiCarlo,et al. Comparison of Object Recognition Behavior in Human and Monkey , 2014, The Journal of Neuroscience.

[9] Ha Hong,et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[10] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[11] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Jonas Kubilius,et al. Predict, then simplify , 2017, NeuroImage.

[13] Tomaso A. Poggio,et al. Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[14] Ha Hong,et al. Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[15] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[16] Ha Hong,et al. Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[17] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18] James J. DiCarlo,et al. How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[19] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] David Cox,et al. Recurrent computations for visual pattern completion , 2017, Proceedings of the National Academy of Sciences.

[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Pouya Bashivan,et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks , 2018 .

[23] Yoshua Bengio,et al. Residual Connections Encourage Iterative Inference , 2017, ICLR.

[24] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25] S. Palmer,et al. A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. , 2012, Psychological bulletin.

[26] Li Fei-Fei,et al. Progressive Neural Architecture Search , 2017, ECCV.

[27] Jonas Kubilius,et al. Can Deep Neural Networks Rival Human Ability to Generalize in Core Object Recognition , 2018 .

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Bolei Zhou,et al. Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Yalda Mohsenzadeh,et al. Beyond Core Object Recognition: Recurrent processes account for object recognition under occlusion , 2019, PLoS Comput. Biol..

[34] E. Halgren,et al. Top-down facilitation of visual recognition. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[35] Jonas Kubilius,et al. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2019, Nature Neuroscience.

[36] W. Pitts,et al. A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[37] Pavlo Molchanov,et al. IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification , 2018, ICLR.

[38] V. Lamme,et al. The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[39] J. DiCarlo,et al. Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.