Rater characteristics and rater bias: implications for training

Recent developments in multifaceted Rasch measurement (Linacre, 1989) have made possible new kinds of investigation of aspects (or 'facets') of performance assessments. Relevant characteristics of such facets (for exam ple, the relative harshness of individual raters, the relative difficulty of test tasks) are modelled and reflected in the resulting person ability measures. In addition, bias analyses, that is interactions between elements of any facet, can also be analysed. (For the facet 'person', an element is an individual candidate; for the facet 'rater', an element is an individual judge, and so on.) This permits investigation of the way a particular aspect of the test situation (type of candidate, choice of prompt, etc.) may elicit a consistently biased pattern of responses from a rater. The purpose of the research is to investigate the use of these analytical techniques in rater training for the speaking subtest of the Occupational English Test (OET), a specific-purpose ESL performance test for health professionals. The test involves a role-play based, profession-specific inter action, involving some degree of choice of role-play material. Data are presented from two rater training sessions separated by an 18-month interval and a subsequent operational test administration session. The analysis is used to establish 1) consistency of rater characteristics over different occasions; and 2) rater bias in relation to occasion of rating. The study thus addresses the question of the stability of rater characteristics, which has practical implications in terms of the accreditation of raters and the requirements of data analysis following test administration sessions. It also has research implications concerning the role of multifaceted Rasch measurement in understanding rater behaviour in performance assessment contexts.