Statistical combinations of specific measures have been shown to be superior to expert judgment in several fields. In this study, judgment analysis was applied to examination marking to investigate factors that influenced marks awarded and contributed to differences between first and second markers. Seven markers in psychology rated 551 examination answers on seven 'aspects' for which specific assessment criteria had been developed to support good practice in assessment. The aspects were: addressing the question, covering the area, understanding, evaluation, development of argument, structure and organization, and clarity. Principal-components analysis indicated one major factor and no more than two minor factors underlying the seven aspects. Aspect ratings were used to predict overall marks, using multiple regression to 'capture' the marking policies of individual markers. These varied from marker to marker in terms of the numbers of aspect ratings that made independent contributions to the prediction of overall marks and the extent to which aspect ratings explained the variance in overall marks. The number of independently predictive aspect ratings, and the amount of variance in overall marks explained by aspect ratings, were consistently higher for first markers (question setters) than for second markers. Co-markers' overall marks were then used as an external criterion to test the extent to which a simple model consisting of the sum of the aspect ratings improved on overall marks in the prediction of comarkers marks. The model significantly increased the variance in co-markers' marks accounted for, but only for second markers, who had not taught the material and had not set the question. Further research is needed to develop the criteria and especially to establish the reliability and validity of specific aspects of assessment. The present results support the view that, for second markers at least, combined measures of specific aspects of examination answers may help to improve the reliability of marking.
[1]
A. Colman,et al.
Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences.
,
2000,
Acta psychologica.
[2]
John Partington.
Double‐marking Students' Work
,
1994
.
[3]
F. Marton,et al.
ON QUALITATIVE DIFFERENCES IN LEARNING: I—OUTCOME AND PROCESS*
,
1976
.
[4]
B. Tabachnick,et al.
Using Multivariate Statistics
,
1983
.
[5]
L. Norton,et al.
Essay-writing: what really counts?
,
1990
.
[6]
E. Quellmalz.
Developing Criteria for performance Assessments: The Missing Link
,
1991
.
[7]
R. Dawes.
Judgment under uncertainty: The robust beauty of improper linear models in decision making
,
1979
.
[8]
Kevin Cox,et al.
Student Assessment in Higher Education: A Handbook for Assessing Performance
,
1998
.
[9]
W. W. Muir,et al.
Regression Diagnostics: Identifying Influential Data and Sources of Collinearity
,
1980
.
[10]
S. B. Filskov,et al.
Clinical-actuarial detection and description of brain impairment with the W-B form I.
,
1981,
Journal of clinical psychology.
[11]
R. Dawes,et al.
House of Cards: Psychology and Psychotherapy Built on Myth.
,
1995
.
[12]
Donald Laming,et al.
The Reliability of a Certain University Examination Compared with the Precision of Absolute Judgements
,
1990
.
[13]
I Dennis,et al.
A new approach to exploring biases in educational assessment.
,
1996,
British journal of psychology.
[14]
Richard L. Wiener,et al.
Evaluation, diagnosis, and prediction in parole decision making.
,
1982
.
[15]
Paul Kline,et al.
An easy guide to factor analysis
,
1993
.
[16]
Christopher Dracup,et al.
The reliability of marking on a psychology degree
,
1997
.