A FACETS analysis of rater bias in measuring Japanese second language writing performance

The purpose of this study is twofold. First, using FACETS (Linacre, 1996), it investigates how judgements of trained teacher raters are biased towards certain types of candidates and certain criteria in assessing Japanese second language (L2) writing. Previous studies that identified significantly biased rater-candidate interactions did not discuss who the candidates were, but this study examines rater-candidate interactions in much more detail. Secondly, since there is no established rating scale for assessing Japanese L2 writing, this study explores the potential for using a modified version of Jacobs et al.’s (1981) rating scale for norm-referenced decisions about Japanese L2 writing ability. The participants in the study comprised 234 university candidates and three trained teacher raters. The raters produced highly correlated scores and were self-consistent, but significant differences in overall severity surfaced. The raters scored certain candidates and criteria more leniently or harshly, and every rater’s bias pattern was different. The highest percentage of significantly biased rater-candidate interactions was found among the candidates whose ability was extremely high or low. This study suggests that the modified version of Jacobs et al.’s scale can be a reliable tool in assessing Japanese L2 writing in norm-referenced settings, but multiple ratings are still necessary.

[1]  K. Hirose,et al.  Development of an analytic rating scale for Japanese L1 writing , 1999 .

[2]  Martha C. Pennington,et al.  Comparing writing process and product across two languages: A study of 6 Singaporean university student writers , 1993 .

[3]  P. Congdon,et al.  Rater Severity in Large-Scale Assessment: Is It Invariant?. , 1997 .

[4]  H. Jacobs Testing Esl Composition: A Practical Approach , 1981 .

[5]  Alister Cumming,et al.  Expertise in evaluating second language compositions , 1990 .

[6]  Sara Cushing Weigle,et al.  Using FACETS to model rater training effects , 1998 .

[7]  Marjorie Bingham Wesche,et al.  Second language performance testing: the Ontario Test of ESL as an example , 1987 .

[8]  J. D. Brown,et al.  THE ALTERNATIVES IN LANGUAGE ASSESSMENT , 1998 .

[9]  Patterns of rater behaviour in the assessment of an oral interaction test , 1994 .

[10]  Elana Shohamy,et al.  The Effect of Raters' Background and Training on the Reliability of Direct Writing Tests , 1992 .

[11]  Geoff Brindley,et al.  THE PROMISE AND THE CHALLENGE , 2012 .

[12]  Miyuki Sasaki,et al.  Explanatory variables for Japanese students' expository writing in English: An exploratory study , 1994 .

[13]  Michael Rube Redfield,et al.  Assessing Language Ability in the Classroom , 1998 .

[14]  Accounting for nonsystematic error in performance ratings , 1996 .

[15]  James Dean Brown,et al.  Designing Second Language Performance Assessments , 1998 .

[16]  Brian K. Lynch,et al.  Investigating variability in tasks and rater judgements in a performance test of foreign language speaking , 1995 .

[17]  K. Kondo-Brown Heritage Language Students of Japanese in Traditional Foreign Language Classes: A Preliminary Empirical Study , 2001 .

[18]  Mary E. Lunz,et al.  Judge Consistency and Severity Across Grading Periods , 1990 .

[19]  James Dean Brown,et al.  Testing in language programs , 1996 .

[20]  Grant Henning,et al.  A Guide to Language Testing: Development Evaluation Research , 1987 .

[21]  Brian K. Lynch,et al.  Using G-theory and Many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants , 1998 .

[22]  Tom Lumley,et al.  Rater characteristics and rater bias: implications for training , 1995 .

[23]  Liz Hamp-Lyons,et al.  Communicative Writing Profiles: An Investigation of the Transferability of a Multiple‐Trait Scoring Instrument Across ESL Writing Assessment Contexts * , 1991 .

[24]  Gillian Wigglesworth Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction , 1993 .

[25]  T. McNamara Measuring Second Language Performance , 1996 .

[26]  Andrew D. Cohen,et al.  Assessing Language Ability in the Classroom , 1994 .

[27]  Kyle Perkins,et al.  On the Use of Composition Scoring Techniques, Objective Measures, and Objective Tests to Evaluate ESL Writing Ability , 1983 .

[28]  George Engelhard,et al.  Evaluating Rater Accuracy in Performance Assessments. , 1996 .

[29]  Liz Hamp-Lyons Assessing Second Language Writing in Academic Contexts , 1991 .

[30]  M. Lunz,et al.  A Method To Compare Rater Severity across Several Administrations. , 1997 .

[31]  Anne Brown,et al.  The effect of rater variables in the development of an occupation-specific language performance test , 1995 .

[32]  P. Robinson,et al.  The Development of Task-Based Assessment in English for Academic Purposes Programs. , 1996 .

[33]  J. D. Brown,et al.  A CATEGORICAL INSTRUMENT FOR SCORING SECOND LANGUAGE WRITING SKILLS , 1984 .