Monitoring of Scoring Using the e‐rater® Automated Scoring System and Human Raters on a Writing Test
暂无分享,去创建一个
[1] Machteld Hoskens,et al. The Rater Bundle Model , 2001 .
[2] Martin Chodorow,et al. Beyond Essay Length: Evaluating e-rater[R]'s Performance on TOEFL[R] Essays. Research Reports. Report 73. RR-04-04. , 2004 .
[3] Isaac I. Bejar,et al. A validity-based approach to quality control and assurance of automated scoring , 2011 .
[4] Shelby J. Haberman,et al. Use of e‐rater® in Scoring of the TOEFL iBT® Writing Test , 2011 .
[5] David M. Williamson,et al. EVALUATION OF THE E‐RATER® SCORING ENGINE FOR THE GRE® ISSUE AND ARGUMENT PROMPTS , 2012 .
[6] Yigal Attali,et al. CONSTRUCT VALIDITY OF E‐RATER® IN SCORING TOEFL® ESSAYS , 2007 .
[7] Brian W. Junker,et al. The Hierarchical Rater Model for Rated Test Items and its Application to Large-Scale Educational Assessment Data , 2002 .
[8] David M. Williamson,et al. A Framework for Evaluation and Use of Automated Scoring , 2012 .
[9] Jill Burstein,et al. AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .
[10] P. Lachenbruch. Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .
[11] Brent Bridgeman,et al. Performance of a Generic Approach in Automated Essay Scoring , 2010 .
[12] W. A. Shewhart,et al. Quality control charts , 1926 .
[13] Leonard S. Cahen,et al. Educational Testing Service , 1970 .
[14] von Davier,et al. The Use of Quality Control and Data Mining Techniques for Monitoring Scaled Scores: An Overview. Research Report. ETS RR-12-20. , 2012 .
[15] Shelby J. Haberman,et al. SAMPLE-SIZE REQUIREMENTS FOR AUTOMATED ESSAY SCORING , 2008 .
[16] Lihua Yao,et al. THE EFFECTS OF RATER SEVERITY AND RATER DISTRIBUTION ON EXAMINEES' ABILITY ESTIMATION FOR CONSTRUCTED‐RESPONSE ITEMS , 2013 .
[17] George Engelhard,et al. Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model , 1994 .
[18] M. Chodorow,et al. BEYOND ESSAY LENGTH: EVALUATING E-RATER®'S PERFORMANCE ON TOEFL® ESSAYS , 2004 .
[19] Shelby J. Haberman,et al. Sample-Size Requirements for Automated Essay Scoring. Research Report. ETS RR-08-32. , 2008 .
[20] M. H. Omar,et al. Statistical Process Control Charts for Measuring and Monitoring Temporal Consistency of Ratings , 2010 .
[21] Edward W. Wolfe,et al. Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use , 2009 .
[22] Shelby J. Haberman,et al. Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25. , 2011 .
[23] E W Wolfe,et al. Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. , 2001, Journal of applied measurement.
[24] Nicholas T. Longford. Models for Uncertainty in Educational Testing , 1995 .
[25] Lawrence T. DeCarlo,et al. Studies of a Latent Class Signal Detection Model for Constructed Response Scoring II: Incomplete and Hierarchical Designs. Research Report. ETS RR-10-08. , 2010 .