Deriving oral assessment scales across different tests and rater groups

The purpose of this study is to derive the criteria/dimensions underlying learners' L2 oral ability scores across three tests: an oral interview, a narration and a read-aloud. A stimulus tape of 18 speech samples was presented to three native speaker rater groups for evaluation. The rater groups included teachers of Arabic as a foreign language in the USA, nonteaching Arabs residing in the USA for at least one year and nonteaching Arabs living in their home country (Lebanon). Each of the raters provided a holistic score for every speech sample. Holistic scores were analysed using the INDSCAL multidimensional scaling model. Results showed that the nonmetric three-dimensional solution provided a good fit to the data. Both regression and speech sample analyses were employed to identify those dimensions. Additionally, subject weights indicated that the three rater groups were emphasizing the three dimensions differentially, thus demon strating that native speaker groups with varied backgrounds perceive the L2 oral construct differently. The study contends that researchers might need to reconsider employing generic component scales. A research approach that derives scales empirically according to the given tests and audiences, and according to the purpose of assessment, is recommended. Finally, replicating this study using other languages, L2 oral ability levels, tests and rater groups is suggested.

[1]  Elana Shohamy,et al.  From Testing Research to Educational Policy: Introducing a New Comprehensive Test of Oral Proficiency. , 1986 .

[2]  Betsy L. Hadden Teacher and Nonteacher Perceptions of Second‐Language Communication* , 1991 .

[3]  Vicki Galloway,et al.  Perceptions of the Communicative Efforts of American Students of Spanish , 1980 .

[4]  E. Tarone On the Variability of Interlanguage Systems , 1983 .

[5]  Elana Shohamy,et al.  THE STABILITY OF ORAL PROFICIENCY ASSESSMENT ON THE ORAL INTERVIEW TESTING PROCEDURES , 1983 .

[6]  N. Underhill Testing Spoken Language: A Handbook of Oral Testing Techniques , 1987 .

[7]  Elana Shohamy Does the testing method make a difference? The case of reading comprehension , 1984 .

[8]  M. Davison Introduction to Multidimensional Scaling and Its Applications , 1983 .

[9]  Elana Shohamy,et al.  The Effect of Raters' Background and Training on the Reliability of Direct Writing Tests , 1992 .

[10]  W. Lambert A Social Psychology of Bilingualism , 1967 .

[11]  Andreas Digeser,et al.  Understanding second language acquisition , 1988, Studies in Second Language Acquisition.

[12]  Samuel Messick Trait equivalence as construct validity of score interpretation across multiple methods of measurement. , 1993 .

[13]  J. Linacre,et al.  Many-facet Rasch measurement , 1994 .

[14]  Grant Henning ORAL PROFICIENCY TESTING: COMPARATIVE VALIDITIES OF INTERVIEW, IMITATION, AND COMPLETION METHODS , 1983 .

[15]  Dieter Wolff,et al.  Teaching language in context. Proficiency-oriented instruction , 1989 .

[16]  David Barnwell,et al.  'Naive' native speakers and judgements of oral proficiency in Spanish , 1989 .

[17]  M. Swain,et al.  THEORETICAL BASES OF COMMUNICATIVE APPROACHES TO SECOND LANGUAGE TEACHING AND TESTING , 1980 .

[18]  Michael H. Long,et al.  An introduction to second language acquisition research , 1990 .

[19]  Joan M. Fayer,et al.  Native and Nonnative Judgments of Intelligibility and Irritation , 1987 .

[20]  Geoff Brindley,et al.  Defining Language Ability: The Criteria for Criteria. , 1991 .

[21]  Lyle F. Bachman 语言测试要略 = Fundamental considerations in language testing , 1990 .

[22]  P. Moss Shifting Conceptions of Validity in Educational Measurement: Implications for Performance Assessment , 1992 .

[23]  Birgit Henriksen,et al.  NATIVE SPEAKER REACTIONS TO LEARNERS' SPOKEN INTERLANGUAGE 1 , 1980 .