Identifying Critical Features for Formative Essay Feedback with Artificial Neural Networks and Backward Elimination

For predicting and improving the quality of essays, text analytic metrics (surface, syntactic, morphological and semantic features) can be used to provide formative feedback to the students. In this study, the intent was to find a small number of features that exhibit a fair proxy of the scores given by the human raters. Using an existing corpus and a text analysis tool for the Dutch language, a large number of features were extracted. Artificial neural networks, Levenberg Marquardt algorithm and backward elimination were used to reduce the number of extracted features automatically. Irrelevant features were eliminated based on the inter-rater agreement between predicted and human scores calculated using Cohen’s Kappa (\(\kappa \)). By using our algorithm, the number of features in this study was reduced from 457 to 23. The selected features were grouped into six different categories. Of these categories, we believe that the features present in the groups “Word Difficulty” and “Lexical Diversity” are most useful for providing automated formative feedback to the students. The approach presented in this research paper is the first step towards our ultimate goal of providing meaningful formative feedback to the students for enhancing their writing skills and capabilities.

[1]  V. Shute Focus on Formative Feedback , 2008 .

[2]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[3]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[4]  Antal van den Bosch,et al.  T-Scan: a new tool for analyzing Dutch text , 2014, CLIN 2014.

[5]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[6]  Edward F. Gehringer,et al.  Automated Assessment of the Quality of Peer Reviews using Natural Language Processing Techniques , 2017, International Journal of Artificial Intelligence in Education.

[7]  Jill Burstein,et al.  Automated Essay Scoring : A Cross-disciplinary Perspective , 2003 .

[8]  Lawrence M. Rudner,et al.  An Evaluation of IntelliMetric™ Essay Scoring System , 2006 .

[9]  G. Payre,et al.  Modified quasi-Newton methods for training neural networks , 1996 .

[10]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[11]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[12]  Pierre Baldi,et al.  Gradient descent learning algorithm overview: a general dynamical systems perspective , 1995, IEEE Trans. Neural Networks.

[13]  Danielle S. McNamara,et al.  Predicting Human Scores of Essay Quality Using Computational Indices of Linguistic and Textual Features , 2011, AIED.

[14]  A. Irons,et al.  Enhancing Learning through Formative Assessment and Feedback , 2007 .

[15]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[16]  Stefan Trausan-Matu,et al.  ReaderBench: A Multi-lingual Framework for Analyzing Text Complexity , 2017, EC-TEL.

[17]  Wim Westera,et al.  ReaderBench Learns Dutch: Building a Comprehensive Automated Essay Scoring System for Dutch Language , 2017, AIED.

[18]  Danielle S McNamara,et al.  Natural language processing in an intelligent writing strategy tutoring system , 2012, Behavior Research Methods.

[19]  Patrick Gallinari,et al.  FEATURE SELECTION WITH NEURAL NETWORKS , 1999 .

[20]  Walter Daelemans,et al.  CLiPS Stylometry Investigation (CSI) corpus: A Dutch corpus for the detection of age, gender, personality, sentiment and deception in text , 2014, LREC.

[21]  Jong Hyuk Park,et al.  Unmanned Aerial Vehicle Flight Point Classification Algorithm Based on Symmetric Big Data , 2016, Symmetry.

[22]  Philip M. McCarthy,et al.  Linguistic Features of Writing Quality , 2010 .

[23]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[24]  Rogier Kraf,et al.  Leesbaarheidsonderzoek: oude problemen, nieuwe kansen , 2009 .

[25]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[26]  Gyu Sang Choi,et al.  Prognosis Essay Scoring and Article Relevancy Using Multi-Text Features and Machine Learning , 2017, Symmetry.

[27]  Diane J. Litman,et al.  Iterative Design and Classroom Evaluation of Automated Formative Feedback for Improving Peer Feedback Localization , 2016, International Journal of Artificial Intelligence in Education.

[28]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[29]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[30]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[31]  Hong Liang,et al.  Text feature extraction based on deep learning: a review , 2017, EURASIP Journal on Wireless Communications and Networking.

[32]  Kimmo Kettunen,et al.  Can Type-Token Ratio be Used to Show Morphological Complexity of Languages?* , 2014, J. Quant. Linguistics.

[33]  Danielle S. McNamara,et al.  To Aggregate or Not? Linguistic Features in Automatic Essay Scoring and Feedback Systems. , 2015 .

[34]  Martin Chodorow,et al.  Computer Analysis of Essay Content for Automated Score Prediction , 1998 .

[35]  Laura K. Allen,et al.  The Writing Pal Intelligent Tutoring System: Usability Testing and Development , 2014 .

[36]  Alastair Irons,et al.  An investigation into the impact of formative feedback on the student learning experience , 2010 .

[37]  Stefan Trausan-Matu,et al.  How Well Do Student Nurses Write Case Studies? A Cohesion-Centered Textual Complexity Analysis , 2017, EC-TEL.