Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps

Neural Language Models (NLMs) have made tremendous advances during the last years, achieving impressive performance on various linguistic tasks. Capitalizing on this, studies in neuroscience have started to use NLMs to study neural activity in the human brain during language processing. However, many questions remain unan-swered regarding which factors determine the ability of a neural language model to capture brain activity (aka its ’brain score’). Here, we make first steps in this direction and examine the impact of test loss, training corpus and model architecture (comparing GloVe, LSTM, GPT-2 and BERT), on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that (1) untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words, with the untrained LSTM outperforming the transformer-based models, being less impacted by the effect of context; (2) that training NLP models improves brain scores in the same brain regions irrespective of the model’s architecture; (3) that Perplexity (test loss) is not a good predictor of brain score; (4) that training data have a strong influence on the outcome and, notably, that off-the-shelf models may lack statistical power to detect brain activations. Overall, we outline the impact of model-training choices, and suggest good practices for future studies aiming at explaining the human language system using neural language models.

[1]  Teoria Statistica Delle Classi e Calcolo Delle Probabilità , 2022, The SAGE Encyclopedia of Research Design.

[2]  Luca Campanelli,et al.  Neurocomputational Models of Language Processing , 2021, Annual Review of Linguistics.

[3]  Alexandre Gramfort,et al.  Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects , 2021, EMNLP.

[4]  Alexandre Gramfort,et al.  Disentangling syntax and semantics in the brain with deep networks , 2021, ICML.

[5]  Eghbal A. Hosseini,et al.  The neural architecture of language: Integrative modeling converges on predictive processing , 2020, Proceedings of the National Academy of Sciences.

[6]  Samuel A. Nastase,et al.  Thinking ahead: spontaneous prediction in context as a keystone of language in humans and machines , 2020, bioRxiv.

[7]  Alexander G. Huth,et al.  Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech , 2020, bioRxiv.

[8]  Tom M. Mitchell,et al.  Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction , 2020, NeurIPS.

[9]  J. King,et al.  Language processing in brains and deep neural networks: computational convergence and its limits , 2020, bioRxiv.

[10]  John Hale,et al.  Text Genre and Training Data Size in Human-like Parsing , 2019, EMNLP.

[11]  Gaël Varoquaux,et al.  Predictive models avoid excessive reductionism in cognitive neuroimaging , 2019, Current Opinion in Neurobiology.

[12]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Alexander G. Huth,et al.  Incorporating Context into Language Encoding Models for fMRI , 2018, bioRxiv.

[15]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[16]  Andrea Bergmann,et al.  Statistical Parametric Mapping The Analysis Of Functional Brain Images , 2016 .

[17]  Po-Hsuan Chen,et al.  A Reduced-Dimension fMRI Shared Response Model , 2015, NIPS.

[18]  Brian Murphy,et al.  Simultaneously Uncovering the Patterns of Brain Regions Involved in Different Story Reading Subprocesses , 2014, PloS one.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Christopher J. Honey,et al.  Selective and Invariant Neural Responses to Spoken and Written Narratives , 2013, The Journal of Neuroscience.

[21]  Cathy J. Price,et al.  A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading , 2012, NeuroImage.

[22]  Jack L. Gallant,et al.  Encoding and decoding in fMRI , 2011, NeuroImage.

[23]  C. Honey,et al.  Topographic Mapping of a Hierarchy of Temporal Receptive Windows Using a Narrated Story , 2011, The Journal of Neuroscience.

[24]  Guido E. Vallejos Mindware: An introduction to the philosophy of cognitive science , 2010 .

[25]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[26]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Sheila E. Blumstein,et al.  The Neurobiology of Language , 1995 .