Does injecting linguistic structure into language models lead to better alignment with brain recordings?

Neuroscientists evaluate deep neural networks for natural language processing as possible candidate models for how language is processed in the brain. These models are often trained without explicit linguistic supervision, but have been shown to learn some linguistic structure in the absence of such supervision (Manning et al., 2020), potentially questioning the relevance of symbolic linguistic theories in modeling such cognitive processes (Warstadt and Bowman, 2020). We evaluate across two fMRI datasets whether language models align better with brain recordings, if their attention is biased by annotations from syntactic or semantic formalisms. Using structure from dependency or minimal recursion semantic annotations, we find alignments improve significantly for one of the datasets. For another dataset, we see more mixed results. We present an extensive analysis of these results. Our proposed approach enables the evaluation of more targeted hypotheses about the composition of meaning in the brain, expanding the range of possible scientific inferences a neuroscientist could make, and opens up new opportunities for cross-pollination between computational neuroscience and linguistics.

[1]  T. Graf,et al.  of the Society for Computation in Linguistics , 2018 .

[2]  Andrew McCallum,et al.  Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL? , 2018, ArXiv.

[3]  Leila Wehbe,et al.  Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain) , 2019, NeurIPS.

[4]  Jiajun Zhang,et al.  Fine-grained neural decoding with distributed word representations , 2020, Inf. Sci..

[5]  S. Frank,et al.  The ERP response to the amount of information conveyed by words in sentences , 2015, Brain and Language.

[6]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[7]  Roger Levy,et al.  Linking artificial and human neural representations of language , 2019, EMNLP.

[8]  Alexander G. Huth,et al.  Incorporating Context into Language Encoding Models for fMRI , 2018, bioRxiv.

[9]  Emily M. Bender,et al.  Sustainable Development and Refinement of Complex Linguistic Annotations at Scale , 2017 .

[10]  John T Hale,et al.  Hierarchical structure guides rapid linguistic predictions during naturalistic listening , 2019, PloS one.

[11]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[12]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[13]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[14]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[15]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[16]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[17]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[18]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[19]  E. Koktová The meaning of the sentence in its semantic and pragmatic aspects , 1991 .

[20]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[21]  Joakim Nivre,et al.  Køpsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding , 2020, IWPT 2020.

[22]  Mario Aguilar,et al.  Predicting Neural Activity Patterns Associated with Sentences Using a Neurobiologically Motivated Model of Semantic Representation , 2016, Cerebral cortex.

[23]  Nancy Kanwisher,et al.  Toward a universal decoder of linguistic meaning from brain activation , 2018, Nature Communications.

[24]  Naoaki Okazaki,et al.  Enhancing Machine Translation with Dependency-Aware Self-Attention , 2020, ACL.

[25]  Omer Levy,et al.  Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[26]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[27]  Gökhan Tür,et al.  Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks , 2016, ArXiv.

[28]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[29]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[30]  Arthur Flexer,et al.  Fast Approximate Hubness Reduction for Large High-Dimensional Data , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[31]  Jon Gauthier,et al.  Does the brain represent words? An evaluation of brain decoding studies of language understanding , 2018, ArXiv.

[32]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[33]  Ari Rappoport,et al.  Universal Conceptual Cognitive Annotation (UCCA) , 2013, ACL.

[34]  Samuel R. Bowman,et al.  Can neural networks acquire a structural bias from raw linguistic data? , 2020, CogSci.

[35]  Luo Si,et al.  Syntax-Enhanced Self-Attention-Based Semantic Role Labeling , 2019, EMNLP/IJCNLP.

[36]  Johan Bos,et al.  Towards Universal Semantic Tagging , 2017, IWCS.

[37]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[38]  Tom M. Mitchell,et al.  Aligning context-based statistical models of language with brain activity during reading , 2014, EMNLP.

[39]  Samuel R. Bowman,et al.  A Gold Standard Dependency Corpus for English , 2014, LREC.

[40]  Stephan Oepen,et al.  SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing , 2014, *SEMEVAL.

[41]  Antal van den Bosch,et al.  Using stochastic language models (SLM) to map lexical, syntactic, and phonological information processing in the brain , 2017, PloS one.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  W. Luh,et al.  Abstract linguistic structure correlates with temporal activity during naturalistic comprehension , 2016, Brain and Language.

[44]  Stephan Oepen,et al.  Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies , 2012, LAW@ACL.

[45]  Nancy Kanwisher,et al.  Artificial Neural Networks Accurately Predict Language Processing in the Brain , 2020 .

[46]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[47]  Yusuke Miyao,et al.  SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing , 2015, *SEMEVAL.

[48]  Yijia Liu,et al.  HIT-SCIR at MRP 2019: A Unified Pipeline for Meaning Representation Parsing via Efficient Training and Effective Encoding , 2019, CoNLL.

[49]  Sampo Pyysalo,et al.  Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , 2020, LREC.

[50]  Uwe Reyle,et al.  From Discourse to Logic - Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory , 1993, Studies in linguistics and philosophy.

[51]  Marie Mikulová,et al.  Announcing Prague Czech-English Dependency Treebank 2.0 , 2012, LREC.

[52]  Daniel Schwartz,et al.  Inducing brain-relevant bias in natural language processing models , 2019, NeurIPS.

[53]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[54]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[55]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[56]  Jean-Rémi King,et al.  Language processing in brains and deep neural networks: computational convergence and its limits , 2020 .

[57]  Willem Zuidema,et al.  Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains , 2019, BlackboxNLP@ACL.

[58]  Ari Rappoport,et al.  A Transition-Based Directed Acyclic Graph Parser for UCCA , 2017, ACL.

[59]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[60]  Stephan Oepen,et al.  Broad-Coverage Semantic Dependency Parsing , 2014 .