Automatic Task Requirements Writing Evaluation via Machine Reading Comprehension

Task requirements (TRs) writing is an important question type in Key English Test and Preliminary English Test. A TR writing question may include multiple requirements and a high-quality essay must respond to each requirement thoroughly and accurately. However, the limited teacher resources prevent students from getting detailed grading instantly. The majority of existing automatic essay scoring systems focus on giving a holistic score but rarely provide reasons to support it. In this paper, we proposed an end-to-end framework based on machine reading comprehension (MRC) to address this problem to some extent. The framework not only detects whether an essay responds to a requirement question, but clearly marks where the essay answers the question. Our framework consists of three modules: question normalization module, ELECTRA based MRC module and response locating module. We extensively explore state-of-the-art MRC methods. Our approach achieves 0.93 accuracy score and 0.85 F1 score on a real-world educational dataset. To encourage reproducible results, we make our code publicly available at https://github.com/aied2021TRMRC/AIED_2021_TRMRC_code.

[1]  Vincent Ng,et al.  Automated Essay Scoring: A Survey of the State of the Art , 2019, IJCAI.

[2]  Yue Zhang,et al.  Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring , 2017, CoNLL.

[3]  Beata Beigman Klebanov,et al.  Automated Evaluation of Writing – 50 Years and Counting , 2020, ACL.

[4]  Xiaodong Liu,et al.  An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks , 2017, IJCNLP.

[5]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[6]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[7]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[8]  Yelong Shen,et al.  ReasoNet: Learning to Stop Reading in Machine Comprehension , 2016, CoCo@NIPS.

[9]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[10]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[13]  Xuanjing Huang,et al.  Automatic Essay Scoring Incorporating Rating Schema via Reinforcement Learning , 2018, EMNLP.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[16]  Ruslan Salakhutdinov,et al.  Gated-Attention Readers for Text Comprehension , 2016, ACL.

[17]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[18]  Hai Zhao,et al.  Retrospective Reader for Machine Reading Comprehension , 2020, AAAI.

[19]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Vincent Ng,et al.  Give Me More Feedback: Annotating Argument Persuasiveness and Related Attributes in Student Essays , 2018, ACL.

[22]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[23]  David Berthelot,et al.  WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia , 2016, ACL.

[24]  Xiaodong Liu,et al.  Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[25]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[26]  Vincent Ng,et al.  Modeling Prompt Adherence in Student Essays , 2014, ACL.

[27]  Vincent Ng,et al.  Give Me More Feedback II: Annotating Thesis Strength and Related Attributes in Student Essays , 2019, ACL.

[28]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[29]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[30]  Rudolf Kadlec,et al.  Text Understanding with the Attention Sum Reader Network , 2016, ACL.