论文信息 - A FACETS analysis of rater bias in measuring Japanese second language writing performance

A FACETS analysis of rater bias in measuring Japanese second language writing performance

The purpose of this study is twofold. First, using FACETS (Linacre, 1996), it investigates how judgements of trained teacher raters are biased towards certain types of candidates and certain criteria in assessing Japanese second language (L2) writing. Previous studies that identified significantly biased rater-candidate interactions did not discuss who the candidates were, but this study examines rater-candidate interactions in much more detail. Secondly, since there is no established rating scale for assessing Japanese L2 writing, this study explores the potential for using a modified version of Jacobs et al.’s (1981) rating scale for norm-referenced decisions about Japanese L2 writing ability. The participants in the study comprised 234 university candidates and three trained teacher raters. The raters produced highly correlated scores and were self-consistent, but significant differences in overall severity surfaced. The raters scored certain candidates and criteria more leniently or harshly, and every rater’s bias pattern was different. The highest percentage of significantly biased rater-candidate interactions was found among the candidates whose ability was extremely high or low. This study suggests that the modified version of Jacobs et al.’s scale can be a reliable tool in assessing Japanese L2 writing in norm-referenced settings, but multiple ratings are still necessary.

Kimi Kondo-Brown | K. Kondo-Brown

[1] K. Hirose,et al. Development of an analytic rating scale for Japanese L1 writing , 1999 .

[2] Martha C. Pennington,et al. Comparing writing process and product across two languages: A study of 6 Singaporean university student writers , 1993 .

[3] P. Congdon,et al. Rater Severity in Large-Scale Assessment: Is It Invariant?. , 1997 .

[4] H. Jacobs. Testing Esl Composition: A Practical Approach , 1981 .

[5] Alister Cumming,et al. Expertise in evaluating second language compositions , 1990 .

[6] Sara Cushing Weigle,et al. Using FACETS to model rater training effects , 1998 .

[7] Marjorie Bingham Wesche,et al. Second language performance testing: the Ontario Test of ESL as an example , 1987 .

[8] J. D. Brown,et al. THE ALTERNATIVES IN LANGUAGE ASSESSMENT , 1998 .

[9] Patterns of rater behaviour in the assessment of an oral interaction test , 1994 .

[10] Elana Shohamy,et al. The Effect of Raters' Background and Training on the Reliability of Direct Writing Tests , 1992 .

[11] Geoff Brindley,et al. THE PROMISE AND THE CHALLENGE , 2012 .