论文信息 - Judge Consistency and Severity Across Grading Periods

Judge Consistency and Severity Across Grading Periods

The purpose of this research project was to confirm that differences in the severity of judges and the stringency of grading periods occur, regardless of the nature of the assessment or the examination materials used. Three rather different examinations that require judges were analyzed, using an extended Rasch model to determine whether differences in judge severity and grading-period stringency were observable for all three examinations. Significant variation in judge severity and some variation across grading periods were found on all three examinations. This implies that regardless of the nature of the examination, items, or judges, examinee/measures cannot be considered independent of the particular judges involved unless correction for severity is made systematically. Accounting for judge severity and gradinig-period stringency is extremely important when pass/fail decisions that are meant to generalize to competence are made, as in certification examinations.

Mary E. Lunz | John A. Stahl | M. Lunz | J. Stahl

[1] Georg Rasch,et al. Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[2] D. D. Gruijter. Two simple models for rater effects. , 1984 .

[3] B. Wright,et al. Best test design , 1979 .

[4] Donald B. Rubin,et al. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. , 1974 .

[5] William B. Michael,et al. A Comparison of the Reliability and Validity of Ratings of Student Performance on Essay Examinations by Professors of English and by Professors in Other Disciplines , 1980 .

[6] G. Masters,et al. Rating scale analysis , 1982 .

[7] Henry Braun,et al. Understanding Scoring Reliability: Experiments in Calibrating Essay Readers , 1988 .

[8] C. Cason,et al. A Deterministic Theory of Clinical Performance Rating , 1984, Evaluation & the health professions.

[9] R. Luce,et al. Simultaneous conjoint measurement: A new type of fundamental measurement , 1964 .

[10] Lunz Me,et al. A comparison of intra- and interjudge decision consistencies using analytic and holistic scoring criteria. , 1990 .

[11] J. Littlefield,et al. A description and four-year analysis of a clinical clerkship evaluation system. , 1981, Journal of medical education.