Towards robust vision by multi-task learning on monkey visual cortex

Deep neural networks set the state-of-the-art across many tasks in computer vision, but their generalization ability to simple image distortions is surprisingly fragile. In contrast, the mammalian visual system is robust to a wide range of perturbations. Recent work suggests that this generalization ability can be explained by useful inductive biases encoded in the representations of visual stimuli throughout the visual cortex. Here, we successfully leveraged these inductive biases with a multitask learning approach: we jointly trained a deep network to perform image classification and to predict neural activity in macaque primary visual cortex (V1) in response to the same natural stimuli. We measured the out-of-distribution generalization abilities of our resulting network by testing its robustness to common image distortions. We found that co-training on monkey V1 data indeed leads to increased robustness despite the absence of those distortions during training. Additionally, we showed that our network’s robustness is often very close to that of an Oracle network where parts of the architecture are directly trained on noisy images. Our results also demonstrated that the network’s representations become more brain-like as their robustness improves. Using a novel constrained reconstruction analysis, we investigated what makes our brain-regularized network more robust. We found that our monkey co-trained network is more sensitive to content than noise when compared to a Baseline network that we trained for image classification alone. Using DeepGaze-predicted saliency maps for ImageNet images, we found that the monkey co-trained network tends to be more sensitive to salient regions in a scene, reminiscent of existing theories on the role of V1 in the detection of object borders and bottom-up saliency. Overall, our work expands the promising research avenue of transferring inductive biases from biological to artificial neural networks on the representational level, and provides a novel analysis of the effects of our transfer.

[1]  Matthias Bethge,et al.  A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions , 2020, ECCV.

[2]  Hiroshi Tamura,et al.  Correspondence between Monkey Visual Cortices and Layers of a Saliency Map Model Based on a Deep Convolutional Neural Network for Representations of Natural Images , 2020, eNeuro.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Walter J. Scheirer,et al.  Using human brain activity to guide machine learning , 2017, Scientific Reports.

[5]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Matthias Bethge,et al.  Learning From Brains How to Regularize Machines , 2019, NeurIPS.

[7]  Martin Schrimpf,et al.  Multi-scale hierarchical neural network models that bridge from single neurons in the primate primary visual cortex to object recognition behavior , 2021, bioRxiv.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[10]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[11]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[12]  Zhaoping Li A saliency map in primary visual cortex , 2002, Trends in Cognitive Sciences.

[13]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[14]  Kuniharu Arai,et al.  Two-dimensional neural network model of the primate saccadic system , 1994, Neural Networks.

[15]  Nikolaus Kriegeskorte,et al.  Representational Distance Learning for Deep Neural Networks , 2015, Front. Comput. Neurosci..

[16]  Martin Schrimpf,et al.  Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations , 2020, bioRxiv.

[17]  Zenghui Wang,et al.  Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[18]  Matthias Bethge,et al.  Engineering a Less Artificial Intelligence , 2019, Neuron.

[19]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[20]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[21]  Alexander S. Ecker,et al.  Inception loops discover what excites neurons most using deep predictive models , 2019, Nature Neuroscience.

[22]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[23]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[24]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[25]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[26]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[27]  Alexander S. Ecker,et al.  Generalization in data-driven models of primary visual cortex , 2020, ICLR.

[28]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Li Zhaoping,et al.  Understanding Vision: Theory, Models, and Data , 2014 .

[30]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Nikolaus Kriegeskorte,et al.  Recurrence is required to capture the representational dynamics of the human visual system , 2019, Proceedings of the National Academy of Sciences.

[33]  Thomas Serre,et al.  Deep Learning: The Good, the Bad, and the Ugly. , 2019, Annual review of vision science.

[34]  Bryan P. Tripp,et al.  A video-driven model of response statistics in the primate middle temporal area , 2018, Neural Networks.

[35]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[36]  Felix Wichmann,et al.  Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency , 2020, NeurIPS.

[37]  Alona Fyshe,et al.  Improved object recognition using neural networks trained to mimic the brain's statistical properties , 2019, Neural Networks.

[38]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[40]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Leon A. Gatys,et al.  Deep convolutional models improve predictions of macaque V1 responses to natural images , 2019, PLoS Comput. Biol..

[43]  Zhaoping Li,et al.  Neural Activities in V1 Create a Bottom-Up Saliency Map , 2012, Neuron.