The Precision of Gain Scores Under an Item Response Theory Perspective: A Comparison of Asymptotic and Exact Conditional Inference About Change

The precision of simple difference or “gain” scores is described in terms of their confidence intervals on the latent trait scale and of significance probabilities under the H₀ of no change. For this, two approaches are compared: one employs the asymptotic normal distribution of the maximum likelihood estimator of the person parameter, the other is based on the exact conditional distribution of the gain score, given the total number-correct score over the two time points. In either case, a detailed assessment of the precision of change measurements results. For illustration, results are presented of three test scales. The present methods yield more relevant and much more detailed psychometric information than the traditional estimation of reliability as a sole indicator of measurement precision. Other areas of application, namely, the comparison of the abilities of two examinees or the aggregation of individual signi.cance levels within groups of examinees, are also mentioned.

[1]  K. C. Klauer An exact and optimal standardized person test for assessing consistency with the rasch model , 1991 .

[2]  D. W. Zimmerman,et al.  On the High Predictive Potential of Change and Growth Measures , 1982 .

[3]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[4]  I. W. Molenaar,et al.  Rasch models: foundations, recent developments and applications , 1995 .

[5]  Anne Boomsma,et al.  Essays on Item Response Theory , 2000 .

[6]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[7]  Cees A. W. Glas,et al.  Tests of Fit for Polytomous Rasch Models , 1995 .

[8]  M. Liou,et al.  Constructing the exact significance level for a person fit statistic , 1992 .

[9]  G. Rasch On General Laws and the Meaning of Measurement in Psychology , 1961 .

[10]  Gideon J. Mellenbergh,et al.  The measurement of individual change. , 1998 .

[11]  M. Liou Exact Person Tests for Assessing Model-Data Fit in the Rasch Model. , 1993 .

[12]  Erling B. Andersen,et al.  Polytomous Rasch Models and their Estimation , 1995 .

[13]  D. W. Zimmerman,et al.  Are Simple Gain Scores Obsolete? , 1996 .

[14]  D. G. Simpson,et al.  The Statistical Analysis of Discrete Data , 1989 .

[15]  S E Vollset,et al.  Confidence intervals for a binomial proportion. , 1994, Statistics in medicine.

[16]  G. Masters A rasch model for partial credit scoring , 1982 .

[17]  Gerhard H. Fischer,et al.  Linear Logistic Models for Change , 1995 .

[18]  Ivo Ponocny,et al.  Nonparametric goodness-of-fit tests for the rasch model , 2001 .

[19]  T. Eggen,et al.  On the loss of information in conditional maximum likelihood estimation of item parameters , 2000 .

[20]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[21]  E. Muraki Information Functions of the Generalized Partial Credit Model , 1993 .

[22]  F. Yates,et al.  Statistical methods for research workers. 5th edition , 1935 .

[23]  G. H. Fischer,et al.  Unidimensional Linear Logistic Rasch Models , 1997 .

[24]  D. W. Zimmerman,et al.  GAIN SCORES IN RESEARCH CAN BE HIGHLY RELIABLE , 1982 .

[25]  Erling B. Andersen,et al.  Discrete Statistical Models with Social Science Applications. , 1980 .

[26]  L. Collins Is Reliability Obsolete? A Commentary on "Are Simple Gain Scores Obsolete?" , 1996 .

[27]  G. J. Mellenbergh A Note on Simple Gain Score Precision , 1999 .

[28]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[29]  H. Gulliksen Theory of mental tests , 1952 .

[30]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.

[31]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[32]  F. J. Anscombe,et al.  The Validity of Comparative Experiments , 1948 .

[33]  Ivo Ponocny,et al.  Nonparametric goodness-of-fit tests for the rasch model , 2002 .

[34]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[35]  D. Andrich A rating formulation for ordered response categories , 1978 .

[36]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .

[37]  Susan E. Embretson,et al.  A multidimensional latent trait model for measuring learning and change , 1991 .

[38]  L. Cronbach,et al.  How we should measure "change": Or should we? , 1970 .

[39]  G. H. Fischer,et al.  The Derivation of Polytomous Rasch Models , 1995 .

[40]  Gerhard H. Fischer,et al.  Derivations of the Rasch Model , 1995 .

[41]  Gerhard H. Fischer,et al.  Some neglected problems in IRT , 1995 .

[42]  H. Scheiblechner Additive conjoint isotonic probabilistic models (ADISOP) , 1999 .

[43]  M. Schervish P Values: What They are and What They are Not , 1996 .

[44]  Hermann Witting,et al.  Mathematische Statistik II , 1985 .

[45]  A. Eddington,et al.  On a Formula for Correcting Statistics for the Effects of a known Probable Error of Observation , 1913 .

[46]  G. Masters,et al.  Rating Scale Analysis. Rasch Measurement. , 1983 .

[47]  Hua-Hua Chang,et al.  The asymptotic posterior normality of the latent trait in an IRT model , 1993 .

[48]  J. Arthur Woodward,et al.  Unreliability of difference scores: A paradox for measurement of change. , 1975 .

[49]  Johann Pfanzagl,et al.  Theory of measurement , 1970 .

[50]  W. Stevens,et al.  Fiducial limits of the parameter of a discontinuous distribution. , 1950, Biometrika.

[51]  Michael T. Kane,et al.  The Precision of Measurements , 1996 .

[52]  Ivo Poncny Exact person fit indexes for the rasch model for arbitrary alternatives , 2000 .

[53]  A. Hamerle [Foundations of measurement in latent trait models (author's transl)]. , 1979, Archiv fur Psychologie.

[54]  Hua-Hua Chang,et al.  The asymptotic posterior normality of the latent trait for polytomous IRT models , 1996 .

[55]  Erling B. Andersen,et al.  Conditional Inference and Models for Measuring , 1974 .

[56]  B Krause,et al.  On problems in measuring change. , 1982, Zeitschrift fur Psychologie mit Zeitschrift fur angewandte Psychologie.

[57]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[58]  John B. Willett,et al.  Some Results on Reliability for the Longitudinal Measurement of Change: Implications for the Design of Studies of Individual Growth , 1989 .

[59]  R. Fisher,et al.  Statistical Methods for Research Workers , 1930, Nature.

[60]  Norbert K Tanzer,et al.  Cross-Cultural Validation of Item Complexity in a LLTM-Calibrated Spatial Ability Test , 1995 .

[61]  Frederic M. Lord,et al.  The Measurement of Growth , 1956 .

[62]  J. Willett,et al.  DEMONSTRATING THE RELIABILITY THE DIFFERENCE SCORE IN THE MEASUREMENT OF CHANGE , 1983 .

[63]  H. O. Lancaster,et al.  Significance Tests in Discrete Distributions , 1961 .

[64]  G. H. Fischer,et al.  Gain Scores Revisited Under an IRT Perspective , 2001 .