Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features

This work presents a novel method of exploring human brain-visual representations, with a view towards replicating these processes in machines. The core idea is to learn plausible computational and biological representations by correlating human neural activity and natural images. Thus, we first propose a model, EEG-ChannelNet, to learn a brain manifold for EEG classification. After verifying that visual information can be extracted from EEG data, we introduce a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations. We then carry out image classification and saliency detection on the learned manifold. Performance analyses show that our approach satisfactorily decodes visual information from neural signals. This, in turn, can be used to effectively supervise the training of deep learning models, as demonstrated by the high performance of image classification and saliency detection on out-of-training classes. The obtained results show that the learned brain-visual features lead to improved performance and simultaneously bring deep models more in line with cognitive neuroscience work related to visual perception and attention.

[1]  Jiashi Feng,et al.  Multimodal Learning and Reasoning for Visual Question Answering , 2017, NIPS.

[2]  Tomoyasu Horikawa,et al.  Generic decoding of seen and imagined objects using hierarchical visual features , 2015, Nature Communications.

[3]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[4]  C. Connor,et al.  Neural representations for object perception: structure, category, and adaptive coding. , 2011, Annual review of neuroscience.

[5]  Trevor Darrell,et al.  Captioning Images with Diverse Objects , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Anina N. Rich,et al.  The Representation of Color across the Human Visual Cortex: Distinguishing Chromatic Signals Contributing to Object Form Versus Surface Color. , 2016, Cerebral cortex.

[7]  Desney S. Tan,et al.  Combining brain computer interfaces with vision for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[9]  Vassilis Athitsos,et al.  Cognitive Analysis of Working Memory Load from Eeg, by a Deep Recurrent Neural Network , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[11]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[13]  J. Peirce Understanding mid-level representations in visual processing. , 2015, Journal of vision.

[14]  J. Kaiser,et al.  Human gamma-frequency oscillations associated with attention and memory , 2007, Trends in Neurosciences.

[15]  Tong Zhang,et al.  Spatial–Temporal Recurrent Neural Network for Emotion Recognition , 2017, IEEE Transactions on Cybernetics.

[16]  Andrew Owens,et al.  Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[17]  J. Gallant,et al.  Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies , 2011, Current Biology.

[18]  Walter J. Scheirer,et al.  Perceptual Annotation: Measuring Human Vision to Improve Computer Vision , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[20]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[21]  O. Bertrand,et al.  Oscillatory gamma activity in humans and its role in object representation , 1999, Trends in Cognitive Sciences.

[22]  Robert Oostenveld,et al.  The five percent electrode system for high-resolution EEG and ERP measurements , 2001, Clinical Neurophysiology.

[23]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jack L. Gallant,et al.  Natural Scene Statistics Account for the Representation of Scene Categories in Human Visual Cortex , 2013, Neuron.

[25]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[26]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[28]  Dwight J. Kravitz,et al.  A new neural framework for visuospatial processing , 2011, Nature Reviews Neuroscience.

[29]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[30]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[31]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[34]  N. Yeung,et al.  The roles of cortical oscillations in sustained attention , 2015, Trends in Cognitive Sciences.

[35]  Ruslan Salakhutdinov,et al.  Generating Images from Captions with Attention , 2015, ICLR.

[36]  Honglak Lee,et al.  Improved Multimodal Deep Learning with Variation of Information , 2014, NIPS.

[37]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Alice Mado Proverbio,et al.  CORRIGENDUM: When a photograph can be heard: Vision activates the auditory cortex within 110 ms , 2013, Scientific Reports.

[39]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[40]  Lina Yao,et al.  Cascade and Parallel Convolutional Recurrent Neural Networks on EEG-based Intention Recognition for Brain Computer Interface , 2017, AAAI.

[41]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[42]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[43]  S. Palazzo,et al.  Deep Learning Human Mind for Automated Visual Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Mubarak Shah,et al.  Generative Adversarial Networks Conditioned by Brain Signals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[46]  DarrellTrevor,et al.  Long-Term Recurrent Convolutional Networks for Visual Recognition and Description , 2017 .

[47]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Mubarak Shah,et al.  Brain2Image: Converting Brain Signals into Images , 2017, ACM Multimedia.

[49]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[50]  J. Bullier Integrated model of visual processing , 2001, Brain Research Reviews.

[51]  Walter J. Scheirer,et al.  Using human brain activity to guide machine learning , 2017, Scientific Reports.

[52]  Yitong Li,et al.  Targeting EEG/LFP Synchrony with Neural Nets , 2017, NIPS.

[53]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[54]  S. Treue Visual attention: the where, what, how and why of saliency , 2003, Current Opinion in Neurobiology.

[55]  Eugenio Culurciello,et al.  Deep Predictive Coding Network for Object Recognition , 2018, ICML.

[56]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[58]  Khan M. Iftekharuddin,et al.  Deep recurrent neural network for seizure detection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[59]  Brent Lance,et al.  EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces , 2016, Journal of neural engineering.

[60]  S. Luck An Introduction to the Event-Related Potential Technique , 2005 .

[61]  Shouqian Sun,et al.  Single-trial EEG classification of motor imagery using deep convolutional neural networks , 2017 .

[62]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[63]  Pulkit Grover,et al.  Very high density EEG elucidates spatiotemporal aspects of early visual processing , 2017, bioRxiv.

[64]  Tiago H. Falk,et al.  Deep learning-based electroencephalography analysis: a systematic review , 2019, Journal of neural engineering.

[65]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[66]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[68]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[69]  Chuang Gan,et al.  The Sound of Pixels , 2018, ECCV.

[70]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[71]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.