论文信息 - Rater characteristics and rater bias: implications for training

Rater characteristics and rater bias: implications for training

Recent developments in multifaceted Rasch measurement (Linacre, 1989) have made possible new kinds of investigation of aspects (or 'facets') of performance assessments. Relevant characteristics of such facets (for exam ple, the relative harshness of individual raters, the relative difficulty of test tasks) are modelled and reflected in the resulting person ability measures. In addition, bias analyses, that is interactions between elements of any facet, can also be analysed. (For the facet 'person', an element is an individual candidate; for the facet 'rater', an element is an individual judge, and so on.) This permits investigation of the way a particular aspect of the test situation (type of candidate, choice of prompt, etc.) may elicit a consistently biased pattern of responses from a rater. The purpose of the research is to investigate the use of these analytical techniques in rater training for the speaking subtest of the Occupational English Test (OET), a specific-purpose ESL performance test for health professionals. The test involves a role-play based, profession-specific inter action, involving some degree of choice of role-play material. Data are presented from two rater training sessions separated by an 18-month interval and a subsequent operational test administration session. The analysis is used to establish 1) consistency of rater characteristics over different occasions; and 2) rater bias in relation to occasion of rating. The study thus addresses the question of the stability of rater characteristics, which has practical implications in terms of the accreditation of raters and the requirements of data analysis following test administration sessions. It also has research implications concerning the role of multifaceted Rasch measurement in understanding rater behaviour in performance assessment contexts.

Tom Lumley | T. Lumley | T. McNamara | T.F. McNamara

[1] Stephen Wiseman. THE MARKING OF ENGLISH COMPOSITION IN GRAMMAR SCHOOL SELECTION1 , 1949 .

[2] B. Huot,et al. The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends , 1990 .

[3] The importance and effectiveness of moderation training on the reliability of teacher assessments of ESL writing samples , 1993 .

[4] Mary E. Lunz,et al. Judge Consistency and Severity Across Grading Periods , 1990 .

[5] Anne Brown,et al. The effect of rater variables in the development of an occupation-specific language performance test , 1995 .

[6] John W. French,et al. FACTORS IN JUDGMENTS OF WRITING ABILITY , 1961 .

[7] William E. Coffman,et al. A Comparison of Two Methods of Reading Essay Examinations , 1968 .

[8] D. Andrich,et al. Inter-Judge Reliability: Is Complete Agreement among Judges the Ideal?. , 1984 .

[9] A. E. Harper,et al. Research on examinations in India , 1976 .

[10] Gillian Wigglesworth. Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction , 1993 .

[11] T. McNamara. Assessing the second language proficiency of health professionals. , 1990 .

[12] C. Cason,et al. A Deterministic Theory of Clinical Performance Rating , 1984, Evaluation & the health professions.