Analysis of Cross-Cultural Comparability of PISA 2009 Scores

The Program for International Student Assessment (PISA) is a large-scale cross-national study that measures academic competencies of 15-year-old students in mathematics, reading, and science from more than 50 countries/economies around the world. PISA results are usually aggregated and presented in so-called “league tables,” in which countries are compared and ranked in each of the three scales. However, to compare results obtained from different groups/countries, one must first be sure that the tests measure the same competencies in all cultures. In this paper, this is tested by examining the level of measurement equivalence in the 2009 PISA data set using an item response theory approach (IRT) and analyzing differential item functioning (DIF). Measurement in-equivalence was found in the form of uniform DIF. In-equivalence occurred in a majority of test questions in all three scales researched and is, on average, of moderate size. It varies considerably both across items and across countries. When this uniform DIF is accounted for in the in-equivalent model, the resulting country scores change considerably in the cases of the “Mathematics,” “Science,” and especially, “Reading” scale. These changes tend to occur simultaneously and in the same direction in groups of regional countries. The most affected seems to be Southeast Asian countries/territories whose scores, although among the highest in the initial, homogeneous model, additionally increase when accounting for in-equivalence in the scales.

[1]  Howard Wainer,et al.  Use of item response theory in the study of group differences in trace lines. , 1988 .

[2]  A. Grisay,et al.  Measuring the Equivalence of Item Difficulty in the Various Versions of an International Test. , 2007 .

[3]  M. Fertig What Can We Learn from International Student Performance Studies? Some Methodological Remarks , 2004 .

[4]  Eric Turkheimer,et al.  Illustration of MIMIC-Model DIF Testing with the Schedule for Nonadaptive and Adaptive Personality , 2009, Journal of psychopathology and behavioral assessment.

[5]  R. Vandenberg,et al.  A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research , 2000 .

[6]  Jay Magidson,et al.  LG-Syntax user's guide: Manual for Latent GOLD 4.5 Syntax module , 2008 .

[7]  J. Horn,et al.  A practical and theoretical guide to measurement invariance in aging research. , 1992, Experimental aging research.

[8]  Fons J. R. Vijver Towards a theory of bias and equivalence , 1998 .

[9]  Andreas Schleicher Measuring Student Knowledge and Skills: A New Framework for Assessment. , 1999 .

[10]  G. Bonnet Reflections in a Critical Eye: On the pitfalls of international assessment , 2002 .

[11]  Ronald K. Hambleton,et al.  Enhancing the Validity of Cross-Cultural Studies: Improvements in Instrument Translation Methods. , 1993 .

[12]  J. Vermunt,et al.  Testing for Measurement Invariance With Latent Class Analysis , 2010 .

[13]  Harvey Goldstein Comment peut-on utiliser les études comparatives internationales pour doter les politiques éducatives d’informations fiables ? , 2008 .

[14]  Adam W. Meade,et al.  A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance , 2004 .

[15]  Allan L. McCutcheon,et al.  Applied Latent Class Analysis: Basic Concepts and Procedures in Single- and Multiple-Group Latent Class Analysis , 2002 .

[16]  Svein Sjøberg PISA and "Real Life Challenges": Mission Impossible? , 2007 .

[17]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[18]  G. Moors,et al.  Cross-National and Cross- Ethnic Differences in Political and Leisure Attitudes. A Case of Luxemburg , 2010 .

[19]  G. Moors,et al.  Cross-National and Cross-Ethnic Differences in Attitudes , 2012 .

[20]  Sotiria Grek,et al.  Governing by numbers: the PISA ‘effect’ in Europe , 2009 .

[21]  Harry C. Triandis,et al.  Effects of Culture and Response Format on Extreme Response Style , 1989 .

[22]  J. Vermunt,et al.  Measurement Equivalence of Ordinal Items: A Comparison of Factor Analytic, Item Response Theory, and Latent Class Approaches , 2011 .

[23]  B. Pennell,et al.  Survey Questionnaire Translation and Assessment , 2004 .

[24]  Y. Poortinga,et al.  Testing Across Cultures , 1991 .

[25]  H. Triandis,et al.  Measurement in Cross-Cultural Psychology , 1985 .

[26]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[27]  Y. Poortinga Equivalence of cross-cultural data: an overview of basic issues. , 1989, International journal of psychology : Journal international de psychologie.

[28]  Ronald K. Hambleton,et al.  The Next Generation of the ITC Test Translation and Adaptation Guidelines , 2001 .

[29]  Barbara M Byrne,et al.  Measurement equivalence: a comparison of methods based on confirmatory factor analysis and item response theory. , 2002, The Journal of applied psychology.

[30]  R. Hambleton,et al.  Adapting educational and psychological tests for cross-cultural assessment , 2004 .

[31]  Gideon J. Mellenbergh,et al.  Item bias and item response theory , 1989 .

[32]  P. Mortimore Alternative models for analysing and representing countries' performance in PISA , 2009 .

[33]  G. Moors,et al.  Researching Measurement Equivalence in Cross-Cultural Studies , 2010 .

[34]  J. Steenkamp,et al.  Assessing Measurement Invariance in Cross-National Consumer Research , 1998 .