16 Item Response Theory and Measuring Abilities

Ability measurement changed greatly at the end of the last century. Classical test theory (CTT) methods guided measurement for most of the last century. CTT methods are based on large sampling norms, fixed length tests, and ordinal level measurement scales. Here, a person's ability performance is based on relative standing among a population of individuals for a given test. However, a person's performance, or its change, relative to a person's performance on a particular item is not well incorporated. Ability measurement and test theory have evolved into newer methods. These newer approaches, known as item response theory (IRT) methods, are replacing CTT techniques. In IRT methods, person performance may be referenced to specific item parameters, as well as to relative standing in a population of persons. Persons can receive different tests measuring the same unidimensional trait, of varying lengths, and still be meaningfully compared, given that the tests are linked or equated. The scale of measurement for IRT is interval-level, so relative score levels and their change are weighted accordingly. In this chapter, both classical and item response theory approaches will be covered, and each method's strengths and weaknesses discussed. Keywords: ability measurement; classical test theory; item response theory; test theory

[1]  R. Brennan,et al.  Estimators of Conditional Scale‐Score Standard Errors of Measurement: A Simulation Study , 2000 .

[2]  Denny Borsboom,et al.  The attack of the psychometricians , 2006, Psychometrika.

[3]  S. Whitely,et al.  The Nature of Objectivity with the Rasch Model , 1974 .

[4]  P. Holland On the sampling theory roundations of item response theory models , 1990 .

[5]  J. Schepers,et al.  Models with item and item group predictors , 2004 .

[6]  M. W. Richardson,et al.  The theory of the estimation of test reliability , 1937 .

[7]  Susan E. Whitely,et al.  Multicomponent latent trait models for ability tests , 1980 .

[8]  G. H. Fischer,et al.  The linear logistic test model as an instrument in educational research , 1973 .

[9]  R. Luce,et al.  Simultaneous conjoint measurement: A new type of fundamental measurement , 1964 .

[10]  S. Whitely Construct validity: Construct representation versus nomothetic span. , 1983 .

[11]  D. Andrich A rating formulation for ordered response categories , 1978 .

[12]  Jan de Leeuw,et al.  On the relationship between item response theory and factor analysis of discretized variables , 1987 .

[13]  L. Cronbach,et al.  Construct validity in psychological tests. , 1955, Psychological bulletin.

[14]  John E. Hunter,et al.  Impact of valid selection procedures on work-force productivity. , 1979 .

[15]  S. Embretson A cognitive design system approach to generating valid tests : Application to abstract reasoning , 1998 .

[16]  D. Campbell,et al.  Convergent and discriminant validation by the multitrait-multimethod matrix. , 1959, Psychological bulletin.

[17]  S. Messick Validity of Psychological Assessment: Validation of Inferences from Persons' Responses and Performances as Scientific Inquiry into Score Meaning. Research Report RR-94-45. , 1994 .

[18]  G. Masters A rasch model for partial credit scoring , 1982 .

[19]  Frederic M. Lord,et al.  The Relation of Test Score to the Trait Underlying the Test , 1952 .

[20]  Susan E. Embretson,et al.  Generating items during testing: Psychometric issues and models , 1999 .

[21]  E. Muraki,et al.  Full-Information Item Factor Analysis , 1988 .

[22]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .