Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Recent research has adopted a new experimental field centered around the concept of text perturbations which has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models across many NLP tasks. These findings contradict the common understanding of how the models encode hierarchical and structural information and even question if the word order is modeled with position embeddings. To this end, this paper proposes nine probing datasets organized by the type of controllable text perturbation for three Indo-European languages with a varying degree of word order flexibility: English, Swedish and Russian. Based on the probing analysis of the M-BERT and M-BART models, we report that the syntactic sensitivity depends on the language and model pre-training objectives. We also find that the sensitivity grows across layers together with the increase of the perturbation granularity. Last but not least, we show that the models barely use the positional information to induce syntactic trees from their intermediate self-attention and contextualized representations.

[1]  Jacob Andreas,et al.  What Context Features Can Transformer Language Models Use? , 2021, ACL.

[2]  David Vilares,et al.  Parsing as Pretraining , 2020, AAAI.

[3]  Alexander Clark,et al.  Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge , 2017, Cogn. Sci..

[4]  Davis Liang,et al.  Masked Language Model Scoring , 2019, ACL.

[5]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[6]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[7]  Mark Steedman,et al.  Data Augmentation via Dependency Tree Morphing for Low-Resource Languages , 2018, EMNLP.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Rui Yan,et al.  Learning to Organize a Bag of Words into Sentences with Neural Networks: An Empirical Study , 2021, NAACL.

[10]  Anjali Agrawal,et al.  Shuffled-token Detection for Refining Pre-trained RoBERTa , 2021, NAACL.

[11]  Ziyang Luo,et al.  Positional Artefacts Propagate Through Masked Language Model Embeddings , 2021, ACL/IJCNLP.

[12]  Pascale Fung,et al.  On the Importance of Word Order Information in Cross-lingual Sequence Labeling , 2020 .

[13]  Kai-Wei Chang,et al.  Syntax-augmented Multilingual BERT for Cross-lingual Transfer , 2021, ACL.

[14]  Robert Frank,et al.  Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.

[15]  Yun-Nung Chen,et al.  What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding , 2020, EMNLP.

[16]  Samuel R. Bowman,et al.  BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.

[17]  Mohit Bansal,et al.  Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.

[18]  Erik Velldal,et al.  Probing Multilingual Sentence Representations With X-Probe , 2019, RepL4NLP@ACL.

[19]  Kentaro Inui,et al.  Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.

[20]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Rowan Hall Maudslay,et al.  Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing , 2021, NAACL.

[23]  Long Mai,et al.  Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks? , 2020, FINDINGS.

[24]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[25]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[26]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[27]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[28]  Peng Qi,et al.  Do Syntax Trees Help Pre-trained Transformers Extract Information? , 2020, ArXiv.

[29]  E. Prince On pragmatic change: The borrowing of discourse functions , 1988 .

[30]  Douwe Kiela,et al.  Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little , 2021, EMNLP.

[31]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[32]  Christopher Joseph Pal,et al.  Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.

[33]  Iryna Gurevych,et al.  LINSPECTOR: Multilingual Probing Tasks for Word Representations , 2019, CL.

[34]  Nanyun Peng,et al.  On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing , 2018, NAACL.

[35]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[36]  Samuel R. Bowman,et al.  Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments , 2019, ArXiv.

[37]  Giulia Venturi,et al.  Linguistic Profiling of a Neural Language Model , 2020, COLING.

[38]  Qun Liu,et al.  Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT , 2020, ACL.

[39]  Changbing Yang,et al.  CLiMP: A Benchmark for Chinese Language Model Evaluation , 2021, EACL.

[40]  Shuohang Wang,et al.  What does BERT Learn from Multiple-Choice Reading Comprehension Datasets? , 2019, ArXiv.

[41]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[42]  Rudolf Rosa,et al.  Inducing Syntactic Trees from BERT Representations , 2019, ArXiv.