Structural Ambiguity and its Disambiguation in Language Model Based Parsers: the Case of Dutch Clause Relativization

This paper addresses structural ambiguity in Dutch relative clauses. By investigating the task of disambiguation by grounding, we study how the presence of a prior sentence can resolve relative clause ambiguities. We apply this method to two parsing architectures in an attempt to demystify the parsing and language model components of two present-day neural parsers. Results show that a neurosymbolic parser, based on proof nets, is more open to data bias correction than an approach based on universal dependencies, although both setups suffer from a comparable initial data bias.

[1]  M. Moortgat,et al.  SPINDLE: Spinning Raw Text into Lambda Terms with Graph Attention , 2023, EACL.

[2]  Tal Linzen,et al.  Syntactic Surprisal From Neural Models Predicts, But Underestimates, Human Processing Difficulty From Syntactic Ambiguities , 2022, CONLL.

[3]  G. Wijnholds,et al.  Discontinuous Constituency and BERT: A Case Study of Dutch , 2022, FINDINGS.

[4]  Willem H. Zuidema,et al.  Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations , 2021, Transactions of the Association for Computational Linguistics.

[5]  Stephen Clark,et al.  Something Old, Something New: Grammar-based CCG Parsing with Transformer Models , 2021, ArXiv.

[6]  Tal Linzen,et al.  The Language Model Understood the Prompt was Ambiguous: Probing Syntactic Uncertainty Through Generation , 2021, BLACKBOXNLP.

[7]  M. Moortgat,et al.  Neural Proof Nets , 2020, CONLL.

[8]  Mehrnoosh Sadrzadeh,et al.  A Frobenius Algebraic Analysis for Parasitic Gaps , 2020, FLAP.

[9]  Roger P. Levy,et al.  A Systematic Assessment of Syntactic Generalization in Neural Language Models , 2020, ACL.

[10]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[11]  David Vilares,et al.  Parsing as Pretraining , 2020, AAAI.

[12]  Bettina Berendt,et al.  RobBERT: a Dutch RoBERTa-based Language Model , 2020, FINDINGS.

[13]  Marie-Francine Moens,et al.  Binary and Multitask Classification Model for Dutch Anaphora Resolution: Die/Dat Prediction , 2020, ArXiv.

[14]  Tommaso Caselli,et al.  BERTje: A Dutch BERT Model , 2019, ArXiv.

[15]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[16]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[17]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[18]  Hongming Zhang,et al.  SP-10K: A Large-scale Evaluation Set for Selectional Preference Acquisition , 2019, ACL.

[19]  Roger Levy,et al.  Neural language models as psycholinguistic subjects: Representations of syntactic state , 2019, NAACL.

[20]  David Vilares,et al.  Viable Dependency Parsing as Sequence Labeling , 2019, NAACL.

[21]  M. Moortgat,et al.  Lexical and Derivational Meaning in Vector-Based Models of Relativisation , 2017, ArXiv.

[22]  Gertjan van Noord Self-Trained Bilexical Preferences to Improve Disambiguation Accuracy , 2007, Trends in Parsing Technology.

[23]  G. Wijnholds Assessing Monotonicity Reasoning in Dutch through Natural Language Inference , 2023, Findings.

[24]  Michael Moortgat,et al.  SICK-NL: A Dataset for Dutch Natural Language Inference , 2021, EACL.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Frank Van Eynde,et al.  Large Scale Syntactic Annotation of Written Dutch: Lassy , 2013, Essential Speech and Language Technology for Dutch.

[27]  Philip Resnik,et al.  Selectional Preference and Sense Disambiguation , 1997 .