A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats

A meta-analysis was conducted on the effects of multiple-choice and open-ended formats on L1 reading, L2 reading, and L2 listening test performance. Fifty-six data sources located in an extensive search of the literature were the basis for the estimates of the mean effect sizes of test format effects. The results using the mixed effects model of meta-analysis indicate that multiple-choice formats are easier than open-ended formats in L1 reading and L2 listening, with the degree of format effect ranging from small to large in L1 reading and medium to large in L2 listening. Overall, format effects in L2 reading are not found, although multiple-choice formats are found to be easier than open-ended formats when any one of the following four conditions is met: the studies involve between-subjects designs, random assignment, stem-equivalent items, or learners with a high L2 proficiency level. Format effects favoring multiple-choice formats across the three domains are consistently observed when studies employ between-subjects designs, random assignment, or stem-equivalent items.

[1]  Constant Leung,et al.  Encyclopedia of Language and Education , 2008 .

[2]  Richard P. DeShon,et al.  Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. , 2002, Psychological methods.

[3]  James Dean Brown,et al.  语言项目中的测试与评价 = Testing in Language Programs : A Comprehensive Guide to English Language Assessment , 2005 .

[4]  J. Norris,et al.  Effectiveness of L2 Instruction: A Research Synthesis and Quantitative Meta‐analysis , 2000 .

[5]  Steven M. Downing,et al.  Handbook of test development , 2006 .

[6]  R. Hanka The Handbook of Research Synthesis , 1994 .

[7]  T. Haladyna Developing and validating multiple-choice test items, 3rd ed. , 2004 .

[8]  Siaegh-Haddad Elinor Reading Native and Foreign Language Texts and Tests: The Case of Arabic and Hebrew Native Speakers Reading L1 and English FL Texts and Tests. , 1997 .

[9]  Bryan D. Edwards,et al.  MULTIPLE‐CHOICE AND CONSTRUCTED RESPONSE TESTS OF ABILITY: RACE‐BASED SUBGROUP PERFORMANCE DIFFERENCES ON ALTERNATIVE PAPER‐AND‐PENCIL TEST FORMATS , 2002 .

[10]  William R. Shadish,et al.  Combining estimates of effect size. , 1994 .

[11]  Cindy Brantmeier,et al.  Effects of Reader's Knowledge, Text Type, and Test Type on L1 and L2 Reading Comprehension in Spanish , 2005 .

[12]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[13]  Mark W. Lipsey,et al.  Practical Meta-Analysis , 2000 .

[14]  B. Spilker,et al.  Meta-Analysis for Explanation: A Casebook , 1993 .

[15]  Descriptors Educational,et al.  of Educational Measurement , 1988 .

[16]  D. Borsboom Educational Measurement (4th ed.) , 2009 .

[17]  M. Rost Teaching and Researching Listening , 2001 .

[18]  M. Kobayashi Method effects on reading comprehension test performance: text organization and response format , 2002 .

[19]  Language Testing and Assessment: An Advanced Resource Book , 2008 .

[20]  Lyle F. Bachman 语言测试要略 = Fundamental considerations in language testing , 1990 .

[21]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[22]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[23]  Eli Hinkel,et al.  Handbook of Research in Second Language Teaching and Learning : Volume 2 , 2011 .

[24]  W. Dunlap A program to compute McGraw and wong’s common language effect size indicator , 1999, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[25]  William C. Ritchie,et al.  Handbook of Second Language Acquisition , 1998 .

[26]  Chang Gen Ling,et al.  Principles of language learning and teaching , 1981 .

[27]  Michael Rube Redfield,et al.  Assessing Language Ability in the Classroom , 1998 .

[28]  Darlene Faye Wolf The effects of task, language of assessment, and target language experience on foreign language learners performance on reading comprehension tests , 1991 .

[29]  G. Buck Assessing Listening , 2001 .

[30]  Rod Ellis,et al.  The Study of Second Language Acquisition , 1994 .

[31]  C. Weir Language Testing and Validation: An Evidence-Based Approach , 2004 .

[32]  D. Bot The Oxford Handbook of Applied Linguistics , 2002 .

[33]  Der C-Test : Theorie, Empirie, Anwendungen = The C-test : theory, empirical research, applications , 2006 .

[34]  Andrew D. Cohen,et al.  Assessing Language Ability in the Classroom , 1994 .

[35]  G. Glass,et al.  Meta-analysis in social research , 1981 .

[36]  Elana Shohamy Does the testing method make a difference? The case of reading comprehension , 1984 .

[37]  de Cornelis Bot,et al.  Handbook of Applied Linguistics , 2003 .

[38]  S. Ross Self-assessment in second language testing: a meta-analysis and analysis of experiential factors , 1998 .

[39]  S. Urbina Essentials of Psychological Testing , 2005, PsyPag Quarterly.

[40]  The Interaction of Reader and Task Factors in the Assessment of Reading Comprehension. , 1984 .

[41]  David B. Pillemer,et al.  Summing Up: The Science of Reviewing Research , 1984 .

[42]  R. DeShon,et al.  Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. , 2002 .

[43]  J. Flowerdew,et al.  Second Language Listening: Theory and Practice , 2006 .

[44]  Cyril J. Weir,et al.  Language Testing and Validation , 2005 .

[45]  T. Haladyna Developing and Validating Multiple-Choice Test Items , 1994 .

[46]  Jack C. Richards,et al.  朗文語言教學及應用語言學辭典 = Longman dictionary of language teaching & applied linguistics , 1998 .

[47]  Henk Blok Reading to young children in educational settings : A meta-analysis of recent research , 1999 .

[48]  J. D. Brown What are the characteristics of natural cloze tests? , 1993 .

[49]  J. Alderson Assessing Reading: Acknowledgements , 2000 .

[50]  Alan Davies,et al.  語言測試詞典=Dictionary of language testing , 1999 .

[51]  Rex B. Kline,et al.  Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research , 2004 .

[52]  Lourdes Ortega,et al.  Synthesizing research on language learning and teaching , 2006 .

[53]  Michael C. Rodriguez Construct Equivalence of Multiple-Choice and Constructed-Response Items: A Random Effects Synthesis of Correlations , 2003 .

[54]  Cyril J. Weir,et al.  Reading in a Second Language: Process, Product and Practice , 1999 .

[55]  Walter Kintsch,et al.  Comprehension: A Paradigm for Cognition , 1998 .

[56]  B. Davey Postpassage Questions: Task and Reader Effects on Comprehension and Metacomprehension Processes , 1987 .

[57]  Therese D. Pigott,et al.  Methods for handling missing data in research synthesis. , 1994 .

[58]  Fred Davidson,et al.  Language Testing and Assessment: An Advanced Resource Book , 2007 .

[59]  Lyle F. Bachman,et al.  语言测试实践 = Language testing in practice , 1998 .

[60]  Michael Rube Redfield,et al.  Language Test Construction and Evaluation , 1997 .

[61]  Andrew D. Cohen,et al.  Strategies in learning and using a second language , 1998 .

[62]  S. Urbina,et al.  Psychological testing, 7th ed. , 1997 .

[63]  印南 洋 The effects of task types on listening test performance : a quantitative and qualitative study , 2007 .

[64]  G. Barrie Wetherill,et al.  Random Effects Models , 1981 .

[65]  E. Ghatala,et al.  Sometimes Adults Miss the Main Ideas and Do Not Realize It: Confidence in Responses to Short-Answer and Multiple-Choice Comprehension Questions. , 1990 .

[66]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data: A Model Comparison Perspective , 1990 .

[67]  Michael C. Rodriguez Three Options Are Optimal for Multiple‐Choice Items: A Meta‐Analysis of 80 Years of Research , 2005 .