In Press, Psychophysiology Neural Correlates of Word Representation Vectors in Natural Language Processing Models: Evidence from Representational Similarity Analysis of Event-Related Brain Potentials

Natural language processing models based on machine learning (ML-NLP models) have been developed to solve practical problems, such as interpreting an Internet search query. These models are not intended to reflect human language comprehension mechanisms, and the word representations used by ML-NLP models and human brains might therefore be quite different. However, because ML-NLP models are trained with the same kinds of inputs that humans must process, and they must solve many of the same computational problems as the human brain, MLNLP models and human brains may end up with similar word representations. To distinguish between these hypotheses, we used representational similarity analysis to compare the representational geometry of word representations in two ML-NLP models with the representational geometry of the human brain, as indexed with event-related potentials (ERPs). Participants listened to stories while the electroencephalogram was recorded. We extracted averaged ERPs for each of the 100 words that occurred most frequently in the stories, and we calculated the similarity of the neural response for each pair of words. We compared this 100×100 similarity matrix to the 100×100 similarity matrix for the word pairs according to two ML-NLP models. We found significant representational similarity between the neural data and each ML-NLP model, beginning within 250 ms of word onset. These results indicate that MLNLP systems that are designed to solve practical technology problems have a representational geometry that is correlated with that of the human brain, presumably because both are influenced by the structural properties and statistics of language. NEURAL CORRELATES OF WORD REPRESENTATIONS. 3

[1]  Taylor R. Hayes,et al.  Rapid Extraction of the Spatial Distribution of Physical Saliency and Semantic Informativeness from Natural Scenes in the Human Brain , 2021, The Journal of Neuroscience.

[2]  Steven M Frankland,et al.  Concepts and Compositionality: In Search of the Brain's Language of Thought. , 2020, Annual review of psychology.

[3]  Pei Zhou,et al.  Retrofitting Contextualized Word Embeddings with Paraphrases , 2019, EMNLP.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Megan A. Boudewyn,et al.  I must have missed that: Alpha-band oscillations track attention to spoken language , 2018, Neuropsychologia.

[6]  Michelle R. Greene,et al.  Shared spatiotemporal category representations in biological and artificial deep neural networks , 2017, bioRxiv.

[7]  Radoslaw Martin Cichy,et al.  Multivariate pattern analysis for MEG: A comparison of dissimilarity measures , 2018, NeuroImage.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[10]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[11]  Morgan Sonderegger,et al.  Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.

[12]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[13]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[14]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[15]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[16]  Marina Schmid,et al.  An Introduction To The Event Related Potential Technique , 2016 .

[17]  W. Ziegler The Oxford Handbook Of Event Related Potential Components , 2016 .

[18]  Megan A. Boudewyn,et al.  Graded expectations: Predictive processing and the adjustment of expectations during spoken language comprehension , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[19]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[20]  Steven J. Luck,et al.  ERPLAB: an open-source toolbox for the analysis of event-related potentials , 2014, Front. Hum. Neurosci..

[21]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[22]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[25]  Megan A. Boudewyn,et al.  Language-Related ERP Components , 2011 .

[26]  Kara D. Federmeier,et al.  Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). , 2011, Annual review of psychology.

[27]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[28]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[29]  M. Lindquist The Statistical Analysis of fMRI Data. , 2008, 0906.3662.

[30]  Michele T. Diaz,et al.  Electrophysiological differentiation of phonological and semantic integration in word and sentence contexts , 2007, Brain Research.

[31]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[32]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[33]  Peter Hagoort,et al.  Electrophysiological Signatures of Visual Lexical Processing: Open-and Closed-Class Words , 1999, Journal of Cognitive Neuroscience.

[34]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[35]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[36]  Peter W. Foltz,et al.  Latent semantic analysis for text-based research , 1996 .

[37]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[38]  J. Connolly,et al.  Event-Related Potential Components Reflect Phonological and Semantic Processing of the Terminal Word of Spoken Sentences , 1994, Journal of Cognitive Neuroscience.

[39]  Gregory Ashby,et al.  On the Dangers of Averaging Across Subjects When Using Multidimensional Scaling or the Similarity-Choice Model , 1994 .

[40]  H. Neville,et al.  Fractionating language: different neural subsystems with different sensitive periods. , 1992, Cerebral cortex.

[41]  M. Kutas,et al.  Event-related brain potentials to grammatical errors and semantic anomalies , 1983, Memory & cognition.

[42]  松澤 喜好,et al.  シャーロック・ホームズの冒険 = Adventures of Sherlock Holmes , 1964 .

[43]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[44]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[45]  A. Doyle The Return of Sherlock Holmes , 1905 .