Generation of Viewed Image Captions From Human Brain Activity Via Unsupervised Text Latent Space

Generation of human cognitive contents based on the analysis of functional magnetic resonance imaging (fMRI) data has been actively researched. Cognitive contents such as viewed images can be estimated by analyzing the relationship between fMRI data and semantic information of viewed images. In this paper, we propose a new method generating captions for viewed images from human brain activity via a novel robust regression scheme. Unlike conventional generation methods using image feature representations, the proposed method makes use of more semantic text feature representations, which are more suitable for the caption generation. We construct a text latent space with unlabeled images not used for the training, and the fMRI data are regressed to the text latent space. Besides, we newly make use of unlabeled images not used for the training phase to improve caption generation performance. Finally, the proposed method can generate captions from the fMRI data measured while subjects are viewing images. Experimental results show that the proposed method enables accurate caption generation for viewed images.

[1]  Yunfeng Lin,et al.  DCNN-GAN: Reconstructing Realistic Image from fMRI , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[2]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[3]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[4]  J. S. Guntupalli,et al.  Decoding neural representational spaces using multivariate pattern analysis. , 2014, Annual review of neuroscience.

[5]  Shinji Nishimoto,et al.  Decoding naturalistic experiences from human brain activity via distributed representations of words , 2017, NeuroImage.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Kendrick N. Kay,et al.  Principles for models of neural information processing , 2017, NeuroImage.

[8]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[12]  David D. Cox,et al.  Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex , 2003, NeuroImage.

[13]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[14]  N. Logothetis What we can do and what we cannot do with fMRI , 2008, Nature.

[15]  Carlos R. Ponce,et al.  Evolving Images for Visual Neurons Using a Deep Generative Network Reveals Coding Principles and Neuronal Preferences , 2019, Cell.

[16]  Hideki Asoh,et al.  Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[17]  Tomoyasu Horikawa,et al.  Generic decoding of seen and imagined objects using hierarchical visual features , 2015, Nature Communications.

[18]  Jack L. Gallant,et al.  Natural Scene Statistics Account for the Representation of Scene Categories in Human Visual Cortex , 2013, Neuron.

[19]  Jack L. Gallant,et al.  A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes , 2015, NeuroImage.

[20]  Guohua Shen,et al.  Deep image reconstruction from human brain activity , 2017, bioRxiv.

[21]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  G. Glover,et al.  Retinotopic organization in human visual cortex and the spatial precision of functional MRI. , 1997, Cerebral cortex.

[25]  Mohammad Reza Daliri,et al.  Decoding Objects of Basic Categories from Electroencephalographic Signals Using Wavelet Transform and Support Vector Machines , 2014, Brain Topography.

[26]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[27]  Jack L. Gallant,et al.  A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain , 2012, Neuron.

[28]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Masa-aki Sato,et al.  Visual Image Reconstruction from Human Brain Activity using a Combination of Multiscale Local Image Decoders , 2008, Neuron.