Effects of Design Properties on Parameter Estimation in Large-Scale Assessments

The selection of an appropriate booklet design is an important element of large-scale assessments of student achievement. Two design properties that are typically optimized are the balance with respect to the positions the items are presented and with respect to the mutual occurrence of pairs of items in the same booklet. The purpose of this study is to investigate the effects of these two design properties on bias and root mean square error of item parameter estimates from the Rasch model. First, position effects are estimated using data from a large-scale assessment study measuring the competencies of 19,107 ninth graders in science. These results were then used for a simulation study with 1,540 booklet designs with systematically varied position balance and cluster pair balance. The simulation results showed a small effect of position balancing on bias and root mean square error of the item parameter estimates while the cluster pair balance was ignorable. This null effect is actually good news for test designers since it allows for deliberately reducing the degree of cluster pair balance without negative effects on item parameter estimates. However, it is recommended to try to achieve a high position balance when designing large-scale assessment studies.

[1]  W. J. Youden,et al.  Experimental designs to increase accuracy of greenhouse studies. , 1940 .

[2]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[3]  W. Youden Use of Incomplete Block Replications in Estimating Tobacco-Mosaic Virus , 1972 .

[4]  David M. Shoemaker,et al.  Principles and procedures of multiple matrix sampling. , 1973 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[7]  Wendy M. Yen,et al.  Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model , 1984 .

[8]  A. Hamlin Assessment by Design , 1996 .

[9]  G. Hutcheson The Multivariate Social Scientist , 1999 .

[10]  Jeffrey M. Wooldridge,et al.  Introductory Econometrics: A Modern Approach , 1999 .

[11]  G. Hutcheson The Multivariate Social Scientist: Introductory Statistics Using Generalized Linear Models , 1999 .

[12]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[13]  Nancy L. Allen,et al.  The NAEP 1998 Technical Report. , 2001 .

[14]  Francis G. Giesbrecht,et al.  Planning, construction, and statistical analysis of comparative experiments , 2005 .

[15]  P. Boeck,et al.  Explanatory item response models : a generalized linear and nonlinear approach , 2004 .

[16]  Jim Rutherford,et al.  Planning, Construction, and Statistical Analysis of Comparative Experiments , 2005, Technometrics.

[17]  Gergory J. Cizek,et al.  Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests , 2006 .

[18]  Deniz Senturk-Doganaksoy,et al.  Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach , 2006, Technometrics.

[19]  P. Holland,et al.  Linking and aligning scores and scales , 2007 .

[20]  Matthew S. Johnson,et al.  Marginal Maximum Likelihood Estimation of Item Response Models in R , 2007 .

[21]  Lale Khorramdel,et al.  Examining item-position effects in large-scale assessment using the Linear Logistic Test Model , 2008 .

[22]  Paul De Boeck,et al.  Random Item IRT Models , 2008 .

[23]  Analyzing position effects within reasoning items using the LLTM for structurally incomplete data , 2008 .

[24]  Walter D. Way,et al.  Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design , 2008 .

[25]  André A. Rupp,et al.  An NCME Instructional Module on Booklet Designs in Large‐Scale Assessments of Student Achievement: Theory and Practice , 2009 .

[26]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[27]  Eugenio Gonzalez,et al.  principles of multiple matrix booklet designs and parameter recovery in large-scale assessments , 2010 .

[28]  Lale Khorramdel,et al.  Analysing item position effects due to test booklet design within large-scale assessment , 2011 .

[29]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[30]  Rianne Janssen,et al.  Modeling Item-Position Effects Within an IRT Framework , 2012 .

[31]  Multilevel Modeling of Item Position Effects , 2013 .

[32]  Shinichi Nakagawa,et al.  A general and simple method for obtaining R2 from generalized linear mixed‐effects models , 2013 .

[33]  Leslie Rutkowski,et al.  Assessment Design for International Large-Scale Assessments , 2013 .

[34]  Leslie Rutkowski,et al.  Handbook of International Large-Scale Assessment : Background, Technical Issues, and Methods of Data Analysis , 2013 .

[35]  Sebastian Weirich,et al.  Modeling Item Position Effects Using Generalized Linear Mixed Models , 2014 .

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Anestis Touloumis SimCorMultRes: Simulates Correlated Multinomial Responses , 2014 .

[38]  Alexander Robitzsch,et al.  Test Analysis Modules , 2015 .

[39]  Sabrina Eberhart,et al.  Applied Missing Data Analysis , 2016 .