A many-facet Rasch analysis of the second language group oral discussion task

FACETS many-facet Rasch analysis software (Linacre, 1998a) was utilized to look at two consecutive administrations of a large-scale (more than 1000 examinees) second language oral assessment in the form of a peer group discussion task with Japanese English-major university students. Facets modeled in the analysis were examinee, prompt, rater, and five rating category ‘items.’ Unidimensionality was shown to be strong in both datasets, and approaches to interpreting fit values for the facets modeled in the analysis were discussed. Examinee ability was the most substantial facet, followed by rater severity, and item. The prompt facet was negligible in magnitude. Rater differences in terms of severity were generally large, but this characteristic was not stable over time for individuals; returning raters tended to move toward greater severity and consistency, while new raters showed much more inconsistency. Analysis of the scales showed general validity in gradations of scale steps, though raters had some difficulty discerning between categories at the ends of the scales for pronunciation and communicative skills.

[1]  S. Puntanen,et al.  A STUDY OF THE STATISTICAL FOUNDATIONS OF GROUP CONVERSATION TESTS IN SPOKEN ENGLISH1 , 1983 .

[2]  Elana Shohamy,et al.  THE STABILITY OF ORAL PROFICIENCY ASSESSMENT ON THE ORAL INTERVIEW TESTING PROCEDURES , 1983 .

[3]  S. Ross Accommodative questions in oral proficiency interviews , 1992 .

[4]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[5]  Tom Lumley,et al.  The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational I settings , 1997 .

[6]  James A. Jones,et al.  Uses of Rasch Modeling in Counseling Psychology Research. , 1998 .

[7]  Anne Brown,et al.  The effect of rater variables in the development of an occupation-specific language performance test , 1995 .

[8]  Brian K. Lynch,et al.  Investigating variability in tasks and rater judgements in a performance test of foreign language speaking , 1995 .

[9]  Brian K. Lynch,et al.  Using G-theory and Many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants , 1998 .

[10]  Dorry M. Kenyon,et al.  Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview , 1992 .

[11]  Elana Shohamy,et al.  The validity of direct versus semi- direct oral tests , 1994 .

[12]  D. Andrich A rating formulation for ordered response categories , 1978 .

[13]  Dave Robertson,et al.  Towards Objectivity in Group Oral Testing , 1976 .

[14]  Mary E. Lunz,et al.  Measuring the Impact of Judge Severity on Examination Scores , 1990 .

[15]  Robert L. Linn,et al.  Educational Assessment: Expanded Expectations and Challenges , 1993 .

[16]  Glenn Fulcher,et al.  Testing tasks: issues in task design and the group oral , 1996 .

[17]  J. Norris,et al.  Effectiveness of L2 Instruction: A Research Synthesis and Quantitative Meta‐analysis , 2000 .

[18]  Judit Kormos Simulating conversations in oral-proficiency assessment: a conversation analysis of role plays and non-scripted interviews in language exams , 1999 .

[19]  Elana Shohamy,et al.  Applying the Joint Committee's Evaluation Standards for the Assessment of Alternative Testing Methods. , 1984 .

[20]  Grant Henning Dimensionality and construct validity of language tests , 1992 .

[21]  Sara Cushing Weigle,et al.  Using FACETS to model rater training effects , 1998 .

[22]  Tom Lumley,et al.  Rater characteristics and rater bias: implications for training , 1995 .

[23]  Lyle F. Bachman,et al.  The Evaluation of Communicative Language Proficiency: A Critique of the ACTFL Oral Interview , 1986 .

[24]  Anne Lazaraton,et al.  Interlocutor support in oral proficiency interviews: the case of CASE , 1996 .

[25]  Leo Van Lier,et al.  Reeling, Writhing, Drawling, Stretching, and Fainting in Coils: Oral Proficiency Interviews as Conversation , 1989 .

[26]  G. Masters A rasch model for partial credit scoring , 1982 .

[27]  Carolyn E. Turner,et al.  Systematic effects in the rating of second-language speaking ability: test method and learner discourse , 1999 .

[28]  Elana Shohamy,et al.  From Testing Research to Educational Policy: Introducing a New Comprehensive Test of Oral Proficiency. , 1986 .

[29]  R. M. Smith,et al.  Using item mean squares to evaluate fit to the Rasch model. , 1998, Journal of outcome measurement.