FOUNDATIONS OF A NEW TEST THEORY

It is only a slight exaggeration to describe the test theory that dominates educational measurement today as the application of twentieth century statistics to nineteenth century psychology. Sophisticated estimation procedures, new techniques for missing-data problems, and theoretical advances into latent-variable modeling have appeared—all applied with psychological models that explain problem-solving ability in terms of a single, continuous variable. This caricature suffices for many practical prediction and selection problems because it expresses patterns in data that are pertinent to the decisions that must be made. It falls short for placement and instruction problems based on students' internal representations of systems, problem-solving strategies, or reconfigurations of knowledge as they learn. Such applications demand different caricatures of ability—more realistic ones that can express patterns suggested by recent developments in cognitive and educational psychology. The application of modern statistical methods with modern psychological models constitutes the foundation of a new test theory.

[1]  S. Embretson A general latent trait model for response processes , 1984 .

[2]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[3]  David J. Weiss,et al.  APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS , 1984 .

[4]  L. Guttman A basis for scaling qualitative data. , 1944 .

[5]  Donald B. Rubin,et al.  The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. , 1974 .

[6]  S. G. Axline,et al.  An artificial intelligence program to advise physicians regarding antimicrobial therapy. , 1973, Computers and biomedical research, an international journal.

[7]  R. Glaser,et al.  READING THEORY AND THE ASSESSMENT OF READING ACHIEVEMENT , 1983 .

[8]  H. Gulliksen Theory of mental tests , 1952 .

[9]  C. Mitchell Dayton,et al.  The Nature and Use of State Mastery Models , 1980 .

[10]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[11]  Colin M. Macleod,et al.  The sentence-verification paradigm: A case study of two conflicting aproaches to individual differences , 1978 .

[12]  Benjamin D. Wright,et al.  SAMPLE-FREE TEST CALIBRATION AND PERSON MEASUREMENT. PAPER PRESENTED AT THE NATIONAL SEMINAR ON ADULT EDUCATION RESEARCH (CHICAGO, FEBRUARY 11-13, 1968). , 1967 .

[13]  F. Lord Notes on a problem of multiple classification , 1952 .

[14]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[15]  F. Samejima A Latent Trait Model for Differential Strategies in Cognitive Processes. , 1983 .

[16]  C McGuire Medical problem-solving: a critique of the literature. , 1984, Research in medical education : proceedings of the ... annual Conference. Conference on Research in Medical Education.

[17]  G. A. Ferguson,et al.  Item selection by the constant process , 1942 .

[18]  L. Tucker,et al.  Maximum validity of a test with equivalent items , 1946, Psychometrika.

[19]  W. J. Langford Statistical Methods , 1959, Nature.

[20]  David Thissen,et al.  Timed Testing: An Approach Using Item Response Theory , 1983 .

[21]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[22]  G. W. Snedecor Statistical Methods , 1964 .

[23]  K. Holzinger Interpretation of Educational Measurements. Truman Lee Kelley , 2022 .

[24]  Bert F. Green,et al.  In defense of measurement. , 1978 .

[25]  F. Samejima A New Family of Models for the Multiple-Choice Item. , 1979 .

[26]  Charles A. Perfetti,et al.  Interactive Processes in Reading , 1981 .

[27]  Robert K. Tsutakawa,et al.  Estimation of Two-Parameter Logistic Item Response Curves , 1984 .

[28]  Robert Glaser,et al.  Instructional technology and the measurement of learing outcomes: Some questions. , 1963 .

[29]  Frederic M. Lord,et al.  Statistical inferences about true scores , 1959 .

[30]  L. Cronbach,et al.  How we should measure "change": Or should we? , 1970 .

[31]  R. Siegler Developmental Sequences within and between Concepts. , 1981 .

[32]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[33]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[34]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  M. W. Richardson,et al.  The theory of the estimation of test reliability , 1937 .

[37]  F. Lord A theory of test scores. , 1952 .

[38]  Peter M. Bentler,et al.  Restricted multidimensional scaling models for asymmetric proximities , 1982 .

[39]  L. Zadeh The role of fuzzy logic in the management of uncertainty in expert systems , 1983 .

[40]  M. R. Novick The axioms and principal results of classical test theory , 1965 .

[41]  C. Mitchell Dayton,et al.  The Use of Probabilistic Models in the Assessment of Mastery , 1977 .

[42]  G. H. Fischer,et al.  Logistic latent trait models with linear constraints , 1983 .

[43]  C. Spearman CORRELATIONS OF SUMS OR DIFFERENCES , 1913 .

[44]  R. Glaser,et al.  The Future of Testing: A Research Agenda for Cognitive Psychology and Psychometrics. , 1981 .

[45]  C. Spearman CORRELATION CALCULATED FROM FAULTY DATA , 1910 .

[46]  D. Lawley,et al.  XXIII.—On Problems connected with Item Selection and Test Construction , 1943, Proceedings of the Royal Society of Edinburgh. Section A. Mathematical and Physical Sciences.

[47]  K. Tatsuoka RULE SPACE: AN APPROACH FOR DEALING WITH MISCONCEPTIONS BASED ON ITEM RESPONSE THEORY , 1983 .

[48]  R. D. Bock,et al.  The Next Stage in Educational Assessment , 1982 .

[49]  L. L. Thurstone,et al.  The measurement of opinion. , 1928 .

[50]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[51]  R. Dawes,et al.  Linear models in decision making. , 1974 .

[52]  Frederic M. Lord,et al.  Further Problems in the Measurement of Growth , 1958 .

[53]  B. Bloom Human Characteristics and School Learning , 1979 .

[54]  S. Embretson,et al.  Component Latent Trait Models for Test Design. , 1982 .

[55]  Frank S. Freeman,et al.  Interpretation of educational measurements , 1928 .

[56]  A. Caramazza,et al.  Naive beliefs in “sophisticated” subjects: misconceptions about trajectories of objects , 1981, Cognition.

[57]  Paul J. Feltovich,et al.  Categorization and Representation of Physics Problems by Experts and Novices , 1981, Cogn. Sci..

[58]  C. Spearman,et al.  Demonstration of Formulae for True Measurement of Correlation , 1907 .

[59]  L. L. Thurstone,et al.  A method of scaling psychological and educational tests. , 1925 .

[60]  G. Masters A rasch model for partial credit scoring , 1982 .

[61]  M. R. Novick,et al.  Statistical methods for educational and psychological research , 1976 .

[62]  E. Thorndike The measurement of intelligence , 1924 .

[63]  Samuel Messick National Assessment of Educational Progress Reconsidered: A New Design for a New Era. , 1983 .

[64]  David Thissen,et al.  A response model for multiple choice items , 1984 .