Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus

The development of automated approaches to linguistic acceptability has been greatly fostered by the availability of the English CoLA corpus, which has also been included in the widely used GLUE benchmark. However, this kind of research for languages other than English, as well as the analysis of cross-lingual approaches, has been hindered by the lack of resources with a comparable size in other languages. We have therefore developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English one. In this paper we describe the corpus creation, we detail its content, and we present the first experiments on this new resource. We compare in-domain and out-of-domain classification, and perform a specific evaluation of nine linguistic phenomena. We also present the first cross-lingual experiments, aimed at assessing whether multilingual transformerbased approaches can benefit from using sentences in two languages during fine-tuning.

[1]  Tal Linzen,et al.  What can linguistics and deep learning contribute to each other? Response to Pater , 2018, Language.

[2]  Annibale Elia Avverbi ed espressioni idiomatiche di carattere locativo , 1982 .

[3]  Omer Levy,et al.  Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[4]  E. Volodina,et al.  DaLAJ – a dataset for linguistic acceptability judgments for Swedish , 2021, NLP4CALL.

[5]  Jon Sprouse,et al.  Assessing the reliability of journal data in syntax : Linguistic Inquiry 2001-2010 , 2022 .

[6]  Sergio Scalise,et al.  Le lingue e il linguaggio , 2003 .

[7]  Carson T. Schütze,et al.  A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001--2010 , 2013 .

[8]  Matthew Purver,et al.  How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context , 2020, Transactions of the Association for Computational Linguistics.

[9]  Tal Linzen,et al.  The reliability of acceptability judgments across languages , 2018, Glossa: a journal of general linguistics.

[10]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[11]  Annibale Elia,et al.  Lessico-grammatica dell'italiano : metodi, descrizioni e applicazioni , 2004 .

[12]  Peter W. Culicover,et al.  Quantitative methods alone are not enough: Response to Gibson and Fedorenko , 2010, Trends in Cognitive Sciences.

[13]  Carson T. Schütze The empirical base of linguistics: Grammaticality judgments and linguistic methodology , 1998 .

[14]  Annibale Elia,et al.  Lessico e strutture sintattiche : introduzione alla sintassi del verbo italiano , 1981 .

[15]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[16]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[17]  Andrea Moro,et al.  Intervention Effects in Wh-Islands: An Eye-Tracking Study , 2015 .

[18]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[19]  Simonetta Vietri,et al.  Idiomatic Constructions in Italian: A Lexicon-Grammar approach , 2014 .

[20]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[21]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[22]  Giovanni Moretti,et al.  Tint 2.0: an All-inclusive Suite for NLP in Italian , 2018, CLiC-it.

[23]  Andrea Moro,et al.  Asymmetries in nominal copular sentences: Psycholinguistic evidence in favor of the raising analysis , 2020 .

[24]  Zhong Chen,et al.  Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese , 2020, Journal of East Asian Linguistics.

[25]  Nikhil Ketkar,et al.  Deep Learning with Python , 2017 .

[26]  Alexander Clark,et al.  Unsupervised Prediction of Acceptability Judgements , 2015, ACL.

[27]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[28]  Madian Khabsa,et al.  Entailment as Few-Shot Learner , 2021, ArXiv.

[29]  Jon Sprouse A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory , 2010, Behavior research methods.

[30]  Elisabetta Ježek Classi di verbi tra semantica e sintassi , 2003 .

[31]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[32]  Haitao Liu,et al.  Dependency direction as a means of word-order typology: A method based on dependency treebanks , 2010 .

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[34]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[35]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[36]  Morten H. Christiansen,et al.  The need for quantitative methods in syntax and semantics research , 2013 .

[37]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[38]  Samuel R. Bowman,et al.  Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments , 2019, ArXiv.

[39]  Josef van Genabith,et al.  Judging Grammaticality: Experiments in Sentence Classification , 2013, CALICO Journal.

[40]  Jeremy H. Clear,et al.  The British national corpus , 1993 .

[41]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[42]  Sandiway Fong,et al.  Natural Language Grammatical Inference with Recurrent Neural Networks , 2000, IEEE Trans. Knowl. Data Eng..

[43]  S. A. Chowdhury,et al.  RNN Simulations of Grammaticality Judgments on Long-distance Dependencies , 2018, COLING.

[44]  Diogo Almeida,et al.  The empirical status of data in syntax: A reply to Gibson and Fedorenko , 2013 .

[45]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[46]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[47]  Gabriele Sarti UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability Prediction with Multi-task Learning on Self-Supervised Annotations , 2020, EVALITA.

[48]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[49]  Vito Pirrelli,et al.  The PAISÀ Corpus of Italian Web Texts , 2014, WaC@EACL.

[50]  Simonetta Vietri,et al.  Lessico-grammatica dell'italiano , 2004 .

[51]  Raffaele Simone Nuovi Fondamenti di linguistica. , 2014 .

[52]  Rodolfo Delmonte Venses @ AcCompl-It: Computing Complexity vs Acceptability with a Constituent Trigram Model and Semantics , 2020, EVALITA.

[53]  Luo Si,et al.  StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.

[54]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[55]  R. Thomas McCoy,et al.  Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks , 2020, TACL.

[56]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[57]  C. Chesi,et al.  Person Features and Lexical Restrictions in Italian Clefts , 2019, Front. Psychol..

[58]  Dirk Hovy,et al.  What the [MASK]? Making Sense of Language-Specific BERT Models , 2020, ArXiv.

[59]  Felice Dell'Orletta,et al.  AcCompl-it @ EVALITA2020: Overview of the Acceptability & Complexity Evaluation Task for Italian , 2020, EVALITA.

[60]  Alexander Clark,et al.  Measuring Gradience in Speakers' Grammaticality Judgements , 2014, CogSci.

[61]  E. Dąbrowska Naive v. expert intuitions: An empirical study of acceptability judgments , 2010 .