Evaluating Computer Automated Scoring: Issues, Methods, and an Empirical Illustration
暂无分享,去创建一个
Yongwei Yang | Chad W. Buckendahl | Dennison S. Bhola | Piotr J. Juszkiewicz | Yongwei Yang | C. Buckendahl | P. Juszkiewicz | D. Bhola
[1] A. Stuart. A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION , 1955 .
[2] S. Siegel,et al. Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.
[3] Karen Kukich,et al. Beyond Automated Essay Scoring , 2000 .
[4] A. E. Maxwell. Comparing the Classification of Subjects by Two Independent Judges , 1970, British Journal of Psychiatry.
[5] A. Feinstein,et al. High agreement but low kappa: II. Resolving the paradoxes. , 1990, Journal of clinical epidemiology.
[6] Henry Braun,et al. On the Synergy between Assessment and Instruction: Early Lessons from Computer-Based Simulations. , 1994 .
[7] Peter W. Foltz,et al. An introduction to latent semantic analysis , 1998 .
[8] Stephen G. Clyman,et al. The Generalizability of Scores for a Performance Assessment Scored with a Computer-Automated Scoring System. , 2000 .
[9] L A Johnson,et al. Dental Interactive Simulations Corporation (DISC): simulations for education, continuing education, and assessment. , 1998, Journal of Dental Education.
[10] W. A. Scott,et al. Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .
[11] A. Feinstein,et al. High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.
[12] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .
[13] Thomas E. Piemme,et al. Development of a Scoring Algorithm to Replace Expert Rating for Scoring a Complex Performance-Based Assessment , 1997 .
[14] Stephen G. Clyman,et al. A Comparison of the Generalizability of Scores Produced by Expert Raters and Automated Scoring Systems , 1999 .
[15] W. Willett,et al. Misinterpretation and misuse of the kappa statistic. , 1987, American journal of epidemiology.
[16] A E Maxwell,et al. Coefficients of Agreement Between Observers and Their Interpretation , 1977, British Journal of Psychiatry.
[17] Chad W. Buckendahl,et al. A Review of Strategies for Validating Computer-Automated Scoring , 2002 .
[18] Michael T. Kane,et al. Validity Issues for Performance-Based Tests Scored With Computer-Automated Scoring Systems , 2002 .
[19] Stephen G. Clyman,et al. Development of Automated Scoring Algorithms for Complex Performance Assessments: A Comparison of Two Approaches. , 1997 .
[20] Randy Elliot Bennett,et al. Validity and Automad Scoring: It's Not Only the Scoring , 1998 .
[21] J. Fleiss. Measuring agreement between two judges on the presence or absence of a trait. , 1975, Biometrics.
[22] Isaac I. Bejar,et al. A methodology for scoring open-ended architectural design problems. , 1991 .
[23] R. Almond,et al. Making Sense of Data From Complex Assessments , 2002 .
[24] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[25] E. B. Page. Computer Grading of Student Prose, Using Modern Concepts and Software , 1994 .
[26] David M. Williamson,et al. "Mental Model" Comparison of Automated and Human Scoring , 1999 .
[27] T. Keith,et al. Trait Ratings for Automated Essay Grading , 2002 .
[28] Rebecca Zwick,et al. Another look at interrater agreement. , 1988, Psychological bulletin.
[29] Daniel Marcu,et al. Benefits of Modularity in an Automated Essay Scoring System , 2000, COLING 2000.
[30] William Wresch,et al. The Imminence of Grading Essays by Computer-25 Years Later , 1993 .
[31] Randy Elliot Bennett,et al. VALIDITY AND AUTOMATED SCORING: IT'S NOT ONLY THE SCORING , 1997 .
[32] W. Grove. Statistical Methods for Rates and Proportions, 2nd ed , 1981 .
[33] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[34] Stephen G. Clyman,et al. Scoring a Performance-Based Assessment by Modeling the Judgments of Experts , 1995 .