Effective Sampling for Large-scale Automated Writing Evaluation Systems

Automated writing evaluation (AWE) has been shown to be an effective mechanism for quickly providing feedback to students. It has already seen wide adoption in enterprise-scale applications and is starting to be adopted in large-scale contexts. Training an AWE model has historically required a single batch of several hundred writing examples and human scores for each of them. This requirement limits large-scale adoption of AWE since human-scoring essays is costly. Here we evaluate algorithms for ensuring that AWE models are consistently trained using the most informative essays. Our results show how to minimize training set sizes while maximizing predictive performance, thereby reducing cost without unduly sacrificing accuracy. We conclude with a discussion of how to integrate this approach into large-scale AWE systems.

[1]  REPURPOSING A BUSINESS LEARNING OUTCOMES ASSESSMENT TO COLLEGE STUDENTS OUTSIDE OF THE UNITED STATES: VALIDITY AND RELIABILITY EVIDENCE , 2013 .

[2]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[3]  T. Landauer Automatic Essay Assessment , 2003 .

[4]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[5]  Peter W. Foltz,et al.  Implementation and Applications of the Intelligent Essay Assessor , 2013 .

[6]  Alfred Bork,et al.  Learning and Assessment , 2001 .

[7]  Thomas E. Whalen The Analysis of Essays by Computer: A Simulation of Teacher' Ratings. , 1971 .

[8]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[9]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[10]  Ellis B. Page,et al.  The Analysis of Essays by Computer. Final Report. , 1968 .

[11]  William J. Welch,et al.  Computer-aided design of experiments , 1981 .

[12]  P. Steerenberg,et al.  Targeting pathophysiological rhythms: prednisone chronotherapy shows sustained efficacy in rheumatoid arthritis. , 2010, Annals of the rheumatic diseases.

[13]  Masashi Sugiyama,et al.  Active Learning for Misspecified Models , 2005, NIPS.

[14]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[15]  Francis R. Bach,et al.  Active learning for misspecified generalized linear models , 2006, NIPS.

[16]  R. C. St. John,et al.  D-Optimality for Regression Designs: A Review , 1975 .

[17]  K. Smith ON THE STANDARD DEVIATIONS OF ADJUSTED AND INTERPOLATED VALUES OF AN OBSERVED POLYNOMIAL FUNCTION AND ITS CONSTANTS AND THE GUIDANCE THEY GIVE TOWARDS A PROPER CHOICE OF THE DISTRIBUTION OF OBSERVATIONS , 1918 .

[18]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[19]  Lawrence M. Rudner,et al.  An Evaluation of IntelliMetric™ Essay Scoring System , 2006 .

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .