RuSentEval: Linguistic Source, Encoder Force!

The success of pre-trained transformer language models has brought a great deal of interest on how these models work, and what they learn about language. However, prior research in the field is mainly devoted to English, and little is known regarding other languages. To this end, we introduce RuSentEval, an enhanced set of 14 probing tasks for Russian, including ones that have not been explored yet. We apply a combination of complementary probing methods to explore the distribution of various linguistic properties in five multilingual transformers for two typologically contrasting languages – Russian and English. Our results provide intriguing findings that contradict the common understanding of how linguistic knowledge is represented, and demonstrate that some properties are learned in a similar manner despite the language differences.

[1]  Ekaterina Shutova,et al.  What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties , 2020, ArXiv.

[2]  Tatiana Shavrina,et al.  AGRR 2019: Corpus for Gapping Resolution in Russian , 2019, BSNLP@ACL.

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[5]  Furu Wei,et al.  MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.

[6]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[7]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[8]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[9]  Dan Klein,et al.  An Empirical Investigation of Statistical Significance in NLP , 2012, EMNLP.

[10]  Jörg Tiedemann,et al.  Are Multilingual Neural Machine Translation Models Better at Capturing Linguistic Features? , 2020, Prague Bull. Math. Linguistics.

[11]  Moscow Russia Yandex,et al.  EXPLORING PRETRAINED MODELS FOR JOINT MORPHO-SYNTACTIC PARSING OF RUSSIAN , 2020 .

[12]  Iryna Gurevych,et al.  How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation , 2020, CONLL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[15]  Ryan Cotterell,et al.  UniMorph 3.0: Universal Morphology , 2018, LREC.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[18]  Yonatan Belinkov,et al.  Similarity Analysis of Contextual Word Representation Models , 2020, ACL.

[19]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[20]  Alina Wróblewska,et al.  Empirical Linguistic Study of Sentence Embeddings , 2019, ACL.

[21]  Erik Velldal,et al.  Probing Multilingual Sentence Representations With X-Probe , 2019, RepL4NLP@ACL.

[22]  Naveen Arivazhagan,et al.  Language-agnostic BERT Sentence Embedding , 2020, ArXiv.

[23]  Douwe Kiela,et al.  SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[24]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[25]  Yonatan Belinkov,et al.  What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[26]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[27]  Yonatan Belinkov,et al.  Analyzing Individual Neurons in Pre-trained Language Models , 2020, EMNLP.

[28]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[29]  Daniel Zeman,et al.  Data Conversion and Consistency of Monolingual Corpora: Russian UD Treebanks , 2018 .

[30]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[31]  Chiara Alzetta,et al.  Dangerous Relations in Dependency Treebanks , 2018, TLT.

[32]  Filip Ginter,et al.  Assessing the Annotation Consistency of the Universal Dependencies Corpora , 2017, DepLing.

[33]  Julian Michael,et al.  Asking without Telling: Exploring Latent Ontologies in Contextual Representations , 2020, EMNLP.

[34]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[35]  Ivan Titov,et al.  Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.

[36]  Luca Zappella,et al.  Finding Experts in Transformer Models , 2020, ArXiv.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[39]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[40]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[41]  Naveen Arivazhagan,et al.  Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.

[42]  Iryna Gurevych,et al.  LINSPECTOR: Multilingual Probing Tasks for Word Representations , 2019, CL.

[43]  Ekaterina Shutova,et al.  Cross-neutralising: Probing for joint encoding of linguistic information in multilingual models , 2020, ArXiv.

[44]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[45]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[46]  Vishrav Chaudhary,et al.  CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Ivan Vulic,et al.  Unsupervised Cross-Lingual Representation Learning , 2019, ACL.

[49]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[50]  Matthew Baerman,et al.  Syncretism , 2007, Lang. Linguistics Compass.

[51]  Aleksandra Gabryszak,et al.  Probing Linguistic Features of Sentence-Level Representations in Relation Extraction , 2020, ACL.