Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Neural networks models for NLP are typically implemented without the explicit encoding of language rules and yet they are able to break one performance record after another. This has generated a lot of research interest in interpreting the representations learned by these networks. We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain. We use brain imaging recordings of subjects reading complex natural text to interpret word and sequence embeddings from 4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, context length, and attention type. Our results reveal differences in the context-related representations across these models. Further, in the transformer models, we find an interaction between layer depth and context length, and between layer depth and attention type. We finally hypothesize that altering BERT to better align with brain recordings would enable it to also better understand language. Probing the altered BERT using syntactic NLP tasks reveals that the model with increased brain-alignment outperforms the original model. Cognitive neuroscientists have already begun using NLP networks to study the brain, and this work closes the loop to allow the interaction between NLP and cognitive neuroscience to be a true cross-pollination.

[1]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[2]  Peter Hagoort,et al.  How the brain solves the binding problem for language: a neurocomputational model of syntactic processing , 2003, NeuroImage.

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[5]  Evelina Fedorenko,et al.  Domain-General Brain Regions Do Not Track Linguistic Input as Closely as Language-Selective Regions , 2017, The Journal of Neuroscience.

[6]  Martin J. Wainwright,et al.  Optimal Rates and Tradeoffs in Multiple Testing , 2017, Statistica Sinica.

[7]  Jack L. Gallant,et al.  Pycortex: an interactive surface visualizer for fMRI , 2015, Front. Neuroinform..

[8]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[9]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[10]  Anders Søgaard,et al.  Evaluating word embeddings with fMRI and eye-tracking , 2016, RepEval@ACL.

[11]  Aran Nayebi,et al.  Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs , 2019, NeurIPS.

[12]  J. Gallant,et al.  Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies , 2011, Current Biology.

[13]  C. Honey,et al.  Topographic Mapping of a Hierarchy of Temporal Receptive Windows Using a Narrated Story , 2011, The Journal of Neuroscience.

[14]  Riitta Salmelin,et al.  Tracking neural coding of perceptual and semantic features of concrete nouns , 2012, NeuroImage.

[15]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[16]  宮川 喜代江,et al.  英語から日本語への訳(11)松岡佑子訳 Harry Potter and the Sorcerer's Stone , 2000 .

[17]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[18]  Brian Murphy,et al.  Simultaneously Uncovering the Patterns of Brain Regions Involved in Different Story Reading Subprocesses , 2014, PloS one.

[19]  E. Arias-Castro,et al.  Distribution-free Multiple Testing , 2016, 1604.07520.

[20]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[21]  Jeffrey M. Zacks,et al.  Reading Stories Activates Neural Representations of Visual and Motor Experiences , 2009, Psychological science.

[22]  S. Thompson-Schill,et al.  Reworking the language network , 2014, Trends in Cognitive Sciences.

[23]  R. Poldrack Can cognitive processes be inferred from neuroimaging data? , 2006, Trends in Cognitive Sciences.

[24]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[25]  Tom M. Mitchell,et al.  Interpretable Semantic Vectors from a Joint Model of Brain- and Text- Based Meaning , 2014, ACL.

[26]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[27]  S. Taulu,et al.  Suppression of Interference and Artifacts by the Signal Space Separation Method , 2003, Brain Topography.

[28]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[29]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[30]  S. Frank,et al.  The ERP response to the amount of information conveyed by words in sentences , 2015, Brain and Language.

[31]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[32]  Gerard de Melo,et al.  Exploring Semantic Properties of Sentence Embeddings , 2018, ACL.

[33]  S. Taulu,et al.  Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements , 2006, Physics in medicine and biology.

[34]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[35]  Roy Schwartz,et al.  Rational Recurrences , 2018, EMNLP.

[36]  A. Friederici The brain basis of language processing: from structure to function. , 2011, Physiological reviews.

[37]  Alexander G. Huth,et al.  Incorporating Context into Language Encoding Models for fMRI , 2018, bioRxiv.

[38]  Rainer Goebel,et al.  Information-based functional brain mapping. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Aaditya Ramdas,et al.  REGULARIZED BRAIN READING WITH SHRINKAGE AND SMOOTHING. , 2014, The annals of applied statistics.

[40]  John Hale,et al.  Finding syntax in human encephalography with beam search , 2018, ACL.

[41]  Andreas Maletti,et al.  Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[42]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP 2018.

[43]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[44]  R. Malach,et al.  Syntactic structure building in the anterior temporal lobe during natural story listening , 2012, Brain and Language.

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[47]  Tom M. Mitchell,et al.  Aligning context-based statistical models of language with brain activity during reading , 2014, EMNLP.

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.