CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence

In this paper, we present the CItA corpus (Corpus Italiano di Apprendenti L1), a collection of essays written by Italian L1 learners collected during the first and second year of lower secondary school. The corpus was built in the framework of an interdisciplinary study jointly carried out by computational linguistics and experimental pedagogists and aimed at tracking the development of written language competence over the years and students’ background information.

[1]  Anke Lüdeling,et al.  Multi-level error annotation in learner corpora , 2005 .

[2]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[3]  Felice Dell'Orletta,et al.  Tracking the Evolution of Written Language Competence: an NLP–based Approach , 2015 .

[4]  Graeme Hirst,et al.  Measuring Interlanguage: Native Language Identification with L1-influence Metrics , 2012, LREC.

[5]  Brian Roark,et al.  Distributional semantic models for the evaluation of disordered language , 2013, HLT-NAACL.

[6]  Felice Dell'Orletta,et al.  Linguistic Profiling of Texts Across Textual Genres and Readability Levels. An Exploratory Study on Italian Fictional Prose , 2013, RANLP.

[7]  Maria Cristina Lavinio L'insegnamento dell'italiano. Un'inchiesta campione in una scuola media sarda , 1975 .

[8]  Douglas Biber,et al.  Using Register-Diversified Corpora for General Language Studies , 1993, Comput. Linguistics.

[9]  Felice Dell'Orletta,et al.  Ensemble system for Part-of-Speech tagging , 2009 .

[10]  Walt Detmar Meurers,et al.  The MERLIN corpus: Learner language and the CEFR , 2014, LREC.

[11]  Philip M. McCarthy,et al.  Linguistic Features of Writing Quality , 2010 .

[12]  Kenji Sagae,et al.  Data-driven Measurement of Child Language Development with Simple Syntactic Templates , 2014, COLING.

[13]  Kemal Oflazer,et al.  Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus , 2015, LAW@NAACL-HLT.

[14]  Cecilia Andorno,et al.  Corpora di italiano L2 : tecnologie, metodi, spunti teorici , 2009 .

[15]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[16]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[17]  Brian Roark,et al.  Syntactic complexity measures for detecting Mild Cognitive Impairment , 2007, BioNLP@ACL.

[18]  Judy M. Parr,et al.  A dual purpose data base for research and diagnostic assessment of student writing , 2010 .

[19]  Benjamin Snyder,et al.  Automatically Learning Measures of Child Language Development , 2012, ACL.

[20]  Pietro Lucisano La ricerca IEA sulla produzione scritta , 1988 .

[21]  Anne Morelli,et al.  L'insegnamento dell'italiano in Belgio , 1983 .

[22]  Markus Dickinson,et al.  Annotating Errors in a Hungarian Learner Corpus , 2012, LREC.

[23]  Pietro Lucisano L'indagine IEA sulla produzione scritta 1 , 1984 .

[24]  Alon Lavie,et al.  Automatic Measurement of Syntactic Development in Child Language , 2005, ACL.

[25]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[26]  Sylviane Granger,et al.  Error-tagged learner corpora and CALL: a promising synergy , 2003 .

[27]  Ugo Cardinale Grande Dizionario Italiano dell'Uso (GRADIT), by Tullio De Mauro , 2002 .

[28]  Egon Stemle,et al.  KoKo: an L1 Learner Corpus for German , 2014, LREC.

[29]  Pietro Lucisano Insegnare a scrivere: dalla parte degli insegnanti , 1991 .

[30]  Katrin Hein,et al.  A Database of Freely Written Texts of German School Students for the Purpose of Automatic Spelling Error Classification , 2014, LREC.

[31]  Xiaofei Lu,et al.  Automatic measurement of syntactic complexity in child language acquisition , 2009 .

[32]  Felice Dell'Orletta,et al.  Accurate Dependency Parsing with a Stacked Multilayer Perceptron , 2009 .

[33]  P. Deane,et al.  What automated analyses of corpora can tell us about students’ writing skills , 2010 .

[34]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[35]  Gaetano Berruto,et al.  Sociolinguistica dell'italiano contemporaneo , 1987 .

[36]  Tullio De Mauro Scuola e linguaggio : questioni di educazione linguistica , 1977 .

[37]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .