IRT Test Equating: Relevant Issues and a Review of Recent Research

The application of item response theory (IRT) methodology to test equating has been a research topic of considerable interest in the past 2 decades. Despite the volume of research, it has been difficult to draw conclusions and make generalizations because different studies have used different types of tests, different types of samples, and different methods for assessing the accuracy of equating results. The purpose of this paper is threefold: (a) to review some of the major studies thus far and synthesize their results, (b) to discuss what questions are as yet unanswered and what problems exist with research methodology, and (c) to provide direction for future research. Whereas earlier research focused on comparing equating methods and IRT models, recent research has addressed such statistical concerns as standard errors of equating, parameter stability, and robustness of IRT models to violations of their assumptions. A major finding from the research so far is that it is unreasonable to expect a single equating method to provide the best results for equating all types of tests. Future research must determine how conditions, such as multidimensionality and test content, affect IRT equating.

[1]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[2]  D. Whitney,et al.  Comparison of Four Procedures for Equating the Tests of General Educational Development. , 1982 .

[3]  John Hattie,et al.  Decision Criteria for Determining Unidimensionality , 1981 .

[4]  T. Guskey Comparison of a Rasch Model Scale and the Grade-Equivalent Scale for Vertical Equating of Test Scores , 1981 .

[5]  N. Petersen,et al.  A Test of the Adequacy of Curvilinear Score Equating Models , 1983 .

[6]  Benjamin D. Wright,et al.  SAMPLE-FREE TEST CALIBRATION AND PERSON MEASUREMENT. PAPER PRESENTED AT THE NATIONAL SEMINAR ON ADULT EDUCATION RESEARCH (CHICAGO, FEBRUARY 11-13, 1968). , 1967 .

[7]  Hessy L. Taft,et al.  A COMPARATIVE STUDY OF CURRICULUM EFFECTS ON THE STABILITY OF IRT AND CONVENTIONAL ITEM PARAMETER ESTIMATES1,2,3 , 1985 .

[8]  W. M. Yen,et al.  Detecting Multidimensionality and Examining Its Effects on Vertical Equating with the Three-Parameter Logistic Model. , 1983 .

[9]  The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models. , 1983 .

[10]  Standard Error of an Equating by Item Response Theory , 1982 .

[11]  Linda L. Cook,et al.  Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability , 1983 .

[12]  B. Wright,et al.  Best Test Design. Rasch Measurement. , 1979 .

[13]  Tomokazu Haebara,et al.  EQUATING LOGISTIC ABILITY SCALES BY A WEIGHTED LEAST SQUARES METHOD , 1980 .

[14]  W. M. Yen Using Simulation Results to Choose a Latent Trait Model , 1981 .

[15]  H. Swaminathan,et al.  Bayesian Estimation in the Rasch Model , 1982 .

[16]  Wendy M. Yen,et al.  Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model , 1984 .

[17]  R. Linn,et al.  A NOTE ON VERTICAL EQUATING VIA THE RASCH MODEL FOR GROUPS OF QUITE DIFFERENT ABILITY AND TESTS OF QUITE DIFFERENT DIFFICULTY , 1979 .

[18]  James Lumsden Tests are perfectly reliable , 1978 .

[19]  J. Gustafsson The Rasch Model in Vertical Equating of Tests: A Critique of Slinde and Linn. , 1979 .

[20]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[21]  Brenda H. Loyd,et al.  VERTICAL EQUATING USING THE RASCH MODEL , 1980 .

[22]  Fritz Drasgow,et al.  Recovery of Two- and Three-Parameter Logistic Item Characteristic Curves: A Monte Carlo Study , 1982 .

[23]  Janice A. Gifford,et al.  Developments in Latent Trait Theory: Models, Technical Issues, and Applications , 1978 .

[24]  F. Lord The Standard Error of Equipercentile Equating , 1981 .

[25]  Susan E. Holmes UNIDIMENSIONALITY AND VERTICAL EQUATING WITH THE RASCH MODEL , 1982 .

[26]  Michael J. Kolen Standard Errors of the Tucker Method for Linear Equating under the Common Item Nonrandom Groups Design. ACT Technical Bullegin Number 44. , 1984 .

[27]  Linda L. Cook IRT Equating: A Flexible Alternative to Conventional Methods for Solving Practical Testing Problems. , 1981 .

[28]  John C. Bianchini,et al.  Anchor Test Study. Final Report. Project Report. , 1974 .

[29]  R. Robert Rentz,et al.  THE NATIONAL REFERENCE SCALE FOR READING: AN APPLICATION OF THE RASCH MODEL1 , 1977 .

[30]  F. Krauss Latent Structure Analysis , 1980 .

[31]  Jan-Eric Gustafsson,et al.  Testing and obtaining fit of data to the Rasch model , 1980 .

[32]  R. Levine EQUATING THE SCORE SCALES OF ALTERNATE FORMS ADMINISTERED TO SAMPLES OF DIFFERENT ABILITY , 1955 .

[33]  S. Stouffer,et al.  Measurement and Prediction , 1954 .

[34]  Deborah J. Harris,et al.  Effect of Examinee Group on Equating Relationships , 1986 .

[35]  AN INVESTIGATION OF THE FEASIBILITY OF APPLYING ITEM RESPONSE THEORY TO EQUATE ACHIEVEMENT TESTS1,2,3 , 1985 .

[36]  M. J. Kolen COMPARISON OF TRADITIONAL AND ITEM RESPONSE THEORY METHODS FOR EQUATING TESTS , 1981 .

[37]  Malcolm James Ree Estimating Item Characteristic Curves , 1979 .

[38]  S. Whitely MODELS, MEANINGS AND MISUNDERSTANDINGS: SOME ISSUES IN APPLYING RASCH'S THEORY , 1977 .

[39]  S. Whitely,et al.  The Nature of Objectivity with the Rasch Model , 1974 .

[40]  Standard Errors of Equipercentile Equating for the Common Item Nonequivalent Populations Design , 1985 .

[41]  Frederic M. Lord,et al.  Practical Applications of Item Characteristic Curve Theory. , 1977 .

[42]  Comparison of IRT Observed-Score and True-Score 'Equatings.'. , 1983 .

[43]  R. Dawis,et al.  An Investigation of the Rasch Simple Logistic Model: Sample Free Item and Test Calibration1 , 1975 .

[44]  Frederic M. Lord MAXIMUM LIKELIHOOD AND BAYESIAN PARAMETER ESTIMATION IN ITEM RESPONSE THEORY , 1986 .

[45]  Harvey Goldstein,et al.  Dimensionality, bias, independence and measurement scale problems in latent trait test score models , 1980 .

[46]  Robert W. Lissitz,et al.  An Exploration of the Robustness of Four Test Equating Models , 1986 .

[47]  F. Lord A theory of test scores. , 1952 .

[48]  Gerhard H. Fischer,et al.  Some Applications of Logistic Latent Trait Models with Linear Constraints on the Parameters , 1982 .

[49]  Ronald K. Hambleton,et al.  LATENT TRAIT MODELS AND THEIR USE IN THE ANALYSIS OF EDUCATIONAL TEST DATA1,2,3 , 1977 .

[50]  Martha L. Stocking,et al.  Developing a Common Metric in Item Response Theory , 1982 .

[51]  Erling B. Andersen,et al.  The Numerical Solution of a Set of Conditional Estimation Equations , 1972 .

[52]  T. A. Warm A PRIMER OF ITEM RESPONSE THEORY , 1978 .

[53]  Benjamin D. Wright,et al.  Solving measurement problems with the Rasch model. , 1977 .

[54]  R. Linn,et al.  An Exploration of the Adequacy of the Rasch Model For the Problem of Vertical Equating. , 1978 .

[55]  J. Anderson,et al.  AN EVALUATION OF RASCH'S STRUCTURAL MODEL FOR TEST ITEMS , 1968 .

[56]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[57]  Frederic M. Lord THE RELATION OF TEST SCORE TO THE TRAIT UNDERLYING THE TEST , 1952 .

[58]  D. R. Divgi Model-Free Evaluation of Equating and Scaling , 1981 .

[59]  R. Linn,et al.  VERTICALLY EQUATED TESTS: FACT OR PHANTOM?* , 1977 .

[60]  R. Forsyth,et al.  Some Empirical Results Related to the Robustness of the Rasch Model , 1981 .