A neural decoding algorithm that generates language from visual activity evoked by natural images

Transforming neural activities into language is revolutionary for human-computer interaction as well as functional restoration of aphasia. Present rapid development of artificial intelligence makes it feasible to decode the neural signals of human visual activities. In this paper, a novel Progressive Transfer Language Decoding Model (PT-LDM) is proposed to decode visual fMRI signals into phrases or sentences when natural images are being watched. The PT-LDM consists of an image-encoder, a fMRI encoder and a language-decoder. The results showed that phrases and sentences were successfully generated from visual activities. Similarity analysis showed that three often-used evaluation indexes BLEU, ROUGE and CIDEr reached 0.182, 0.197 and 0.680 averagely between the generated texts and the corresponding annotated texts in the testing set respectively, significantly higher than the baseline. Moreover, we found that higher visual areas usually had better performance than lower visual areas and the contribution curve of visual response patterns in language decoding varied at successively different time points. Our findings demonstrate that the neural representations elicited in visual cortices when scenes are being viewed have already contained semantic information that can be utilized to generate human language. Our study shows potential application of language-based brain-machine interfaces in the future, especially for assisting aphasics in communicating more efficiently with fMRI signals.