Automated essay scoring using efficient transformer-based language models

Automated Essay Scoring (AES) is a cross-disciplinary effort involving Education, Linguistics, and Natural Language Processing (NLP). The efficacy of an NLP model in AES tests it ability to evaluate long-term dependencies and extrapolate meaning even when text is poorly written. Large pretrained transformer-based language models have dominated the current stateof-the-art in many NLP tasks, however, the computational requirements of these models make them expensive to deploy in practice. The goal of this paper is to challenge the paradigm in NLP that bigger is better when it comes to AES. To do this, we evaluate the performance of several fine-tuned pretrained NLP models with a modest number of parameters on an AES dataset. By ensembling our models, we achieve excellent results with fewer parameters than most pretrained transformer-based models.

[1]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[2]  Martin Chodorow,et al.  Enriching Automated Essay Scoring Using Discourse Marking , 2001 .

[3]  Jing Chen,et al.  Building e‐rater® Scoring Models Using Machine Learning Methods , 2016 .

[4]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[5]  Yiming Yang,et al.  MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.

[6]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Lukasz Kaiser,et al.  Rethinking Attention with Performers , 2020, ArXiv.

[9]  Pedro Uria Rodriguez,et al.  Language models and Automated Essay Scoring , 2019, ArXiv.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[14]  Maomi Ueno,et al.  Neural Automated Essay Scoring Incorporating Handcrafted Features , 2020, COLING.

[15]  Yoko Futagi,et al.  Patterns of misspellings in L2 and L1 English: a view from the ETS Spelling Corpus , 2015 .

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Jill Burstein,et al.  Automated Essay Scoring : A Cross-disciplinary Perspective , 2003 .

[18]  Xiaodong He,et al.  Enhancing Automated Essay Scoring Performance via Cohesion Measurement and Combination of Regression and Ranking , 2020, FINDINGS.

[19]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[20]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[21]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[22]  Zornitsa Kozareva,et al.  PRADO: Projection Attention Networks for Document Classification On-Device , 2019, EMNLP.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[25]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  D. Krathwohl A Revision of Bloom's Taxonomy: An Overview , 2002 .

[28]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[29]  Alan W. Black,et al.  Should You Fine-Tune BERT for Automated Essay Scoring? , 2020, BEA.

[30]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[31]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[32]  Semire Dikli,et al.  An Overview of Automated Scoring of Essays. , 2006 .

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yue Zhang,et al.  Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring , 2017, CoNLL.

[35]  Helen Yannakoudakis,et al.  Automatic Text Scoring Using Neural Networks , 2016, ACL.

[36]  Ted Briscoe,et al.  Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input , 2018, NAACL.

[37]  Peter W. Foltz,et al.  The intelligent essay assessor: Applications to educational technology , 1999 .

[38]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[39]  Mark D. Shermis,et al.  State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration , 2014 .

[40]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[41]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[42]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[43]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[44]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[45]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[46]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[47]  Stephen Merity,et al.  Single Headed Attention RNN: Stop Thinking With Your Head , 2019, ArXiv.