Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

Development of language proficiency models for non-native learners has been an active area of interest in NLP research for the past few years. Although language proficiency is multidimensional in nature, existing research typically considers a single “overall proficiency” while building models. Further, existing approaches also considers only one language at a time. This paper describes our experiments and observations about the role of pre-trained and fine-tuned multilingual embeddings in performing multi-dimensional, multilingual language proficiency classification. We report experiments with three languages – German, Italian, and Czech – and model seven dimensions of proficiency ranging from vocabulary control to sociolinguistic appropriateness. Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve consistently best performance for all dimensions of language proficiency1.

[1]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[2]  Hwee Tou Ng,et al.  Flexible Domain Adaptation for Automated Essay Scoring Using Correlated Linear Regression , 2015, EMNLP.

[3]  Torsten Zesch,et al.  Task-Independent Features for Automated Essay Grading , 2015, BEA@NAACL-HLT.

[4]  Taraka Rama,et al.  Experiments with Universal CEFR Classification , 2018, BEA@NAACL-HLT.

[5]  David Alfter,et al.  Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings , 2016, CL4LC@COLING 2016.

[6]  Daniel Edmiston,et al.  A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages , 2020, ArXiv.

[7]  Jan Hajic,et al.  UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing , 2016, LREC.

[8]  Piek T. J. M. Vossen,et al.  A Shared Task of a New, Collaborative Type to Foster Reproducibility: A First Exercise in the Area of Language Science and Technology with REPROLANG2020 , 2020, LREC.

[9]  Sowmya Vajjala,et al.  Automatic CEFR Level Prediction for Estonian Learner Text , 2014 .

[10]  Yue Zhang,et al.  Automatic Features for Essay Scoring – An Empirical Study , 2016, EMNLP.

[11]  Ronan Cummins,et al.  Neural Multi-task Learning in Automated Assessment , 2018, ArXiv.

[12]  Lilja Øvrelid,et al.  Regression or classification? Automated Essay Scoring for Norwegian , 2019, BEA@ACL.

[13]  Beata Beigman Klebanov,et al.  Word Association Profiles and their Use for Automated Scoring of Essays , 2013, ACL.

[14]  Vincent Ng,et al.  Automated Essay Scoring: A Survey of the State of the Art , 2019, IJCAI.

[15]  Pushpak Bhattacharyya,et al.  Can Neural Networks Automatically Score Essay Traits? , 2020, BEA.

[16]  Automated Trait Scores for TOEFL ® Writing Tasks , 2015 .

[17]  Robert N. Kantor,et al.  Toward Automated Multi-trait Scoring of Essays: Investigating Links among Holistic, Analytic, and Text Feature Scores , 2010 .

[18]  Julia Hancke,et al.  Automatic Prediction of CEFR Proficiency Levels Based on Linguistic Features of Learner Language , 2013 .

[19]  Alan W. Black,et al.  Should You Fine-Tune BERT for Automated Essay Scoring? , 2020, BEA.

[20]  Helen Yannakoudakis,et al.  Automatic Text Scoring Using Neural Networks , 2016, ACL.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  Jan Snajder,et al.  Using Analytic Scoring Rubrics in the Automatic Assessment of College-Level Summary Writing Tasks in L2 , 2017, IJCNLP.

[23]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[24]  Nicolas Ballier,et al.  Machine learning for learner English , 2020, International Journal of Learner Corpus Research.

[25]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[26]  Ben He,et al.  TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring , 2018, ACL.

[27]  N. Verhelst,et al.  Common European Framework of Reference for Languages: learning, teaching, assessment , 2009 .

[28]  Yang Xu,et al.  Automated Essay Scoring based on Two-Stage Learning , 2019, ArXiv.

[29]  Walt Detmar Meurers,et al.  The MERLIN corpus: Learner language and the CEFR , 2014, LREC.

[30]  Eunjin Shin A Neural Network approach to Automated Essay Scoring: A Comparison with the Method of Integrating Deep Language Features using Coh-Metrix , 2018 .

[31]  Beata Beigman Klebanov,et al.  Automated Evaluation of Writing – 50 Years and Counting , 2020, ACL.

[32]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..