The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational I settings

The increasing demand for performance assessment of speaking skills in second languages has led to logistic complications, for example, the delivery of tests in overseas locations. One solution to the problem has been to train native speaker interlocutors to carry out a series of oral interactions with the candidate, with assessment from audiorecordings of the test session postponed and conducted cen trally by a small team of trained raters. This technique is currently used in two large-scale occupationally related ESP tests administered internationally on behalf of the Australian government. But these procedures raise questions about the effect of such facets of the assessment situation as interlocutor variables and the quality of the audiotape recording. Recent developments in multifaceted Rasch measure ment have significantly broadened the possibilities for investigation of these issues. The research presented in this article investigates potential problems associated with the above approach to the offshore testing of speaking skills. Data from audiotape-based assessments of 70 offshore candidates from two administrations of the Occupational English Test, an advanced-level ESP test for health pro fessionals, are considered. In addition to multiple ratings of candidate perform ance, each recording is rated for perceptions of the competence of the interlocutor, the rapport established between the candidate and the interlocutor, and the audi bility of the interaction. These aspects of the assessment situation are treated as facets in a series of multifaceted Rasch analyses of the data. The results of the analysis reveal the effects of interlocutor variability and audio tape quality on ratings. The article concludes with an evaluation of the overall feasibility of the procedure, and implications for test administration are considered. The study is also a further demonstration of the application of multifaceted Rasch measurement in performance assessment settings.