Graphical Models and Computerized Adaptive Testing

This paper synthesizes ideas from the fields of graphical modeling and educational testing, particularly Item Response Theory (IRT) applied to Computerized Adaptive Testing (CAT). Graphical modeling can offer IRT a language for describing multifaceted skills and knowledge, and disentangling evidence from complex performances. IRT-CAT can offer graphical modelers several ways of treating sources of variability other than including more variables in the model. In particular, variables can enter into the modeling process at five levels: (1) in validity studies (but not in the ordinarily used model), (2) in task construction (in particular, in defining link parameters), (3) in test or model assembly (blocking and randomization constraints in selecting tasks or other model pieces), (4) in response characterization (that is, as part of task models that characterize a response), or (5) in the main (student) model. The Graduate Record Examinations® (GRE®) are used to illustrate ideas in the context of IRT-CAT, and extensions are discussed in the context of language proficiency testing.

[1]  George Engelhard,et al.  Objective Measurement: Theory into Practice, Vol. 3 , 1996 .

[2]  Eric Horvitz,et al.  An Approximate Nonmyopic Computation for Value of Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Irwin S. Kirsch,et al.  Literacy, profiles of America's young adults , 1986 .

[4]  Russell G. Almond Graphical belief modeling , 1995 .

[5]  Robert J. Mislevy,et al.  Bayes modal estimation in item response models , 1986 .

[6]  Jay Magidson,et al.  Advances in factor analysis and structural equation models , 1979 .

[7]  D. Schum The Evidential Foundations of Probabilistic Reasoning , 1994 .

[8]  Lyle F. Bachman 语言测试要略 = Fundamental considerations in language testing , 1990 .

[9]  Russell G. Almond,et al.  On Test Selection Strategies for Belief Networks , 1995, AISTATS.

[10]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[11]  Raymond J. Adams,et al.  The Multidimensional Random Coefficients Multinomial Logit Model , 1997 .

[12]  Donald E. Powers,et al.  The Relationship of Content Characteristics of GRE Analytical Reasoning Items to Their Difficulties and Discriminations , 1989 .

[13]  William Stout,et al.  A nonparametric approach for assessing latent trait unidimensionality , 1987 .

[14]  Wells HivelyII,et al.  A “UNIVERSE‐DEFINED” SYSTEM OF ARITHMETIC ACHIEVEMENT TESTS1 , 1968 .

[15]  Thom Hudson,et al.  Assessing second language academic reading from a communicative competence perspective : relevance for TOEFL 2000 , 1996 .

[16]  Henry Widdowson Teaching language as communication , 1978 .

[17]  Michael I. Jordan,et al.  Recursive Algorithms for Approximating Probabilities in Graphical Models , 1996, NIPS.

[18]  R. Freedle,et al.  The prediction of TOEFL reading item difficulty: implications for construct validity , 1993 .

[19]  Robert J. Mislevy,et al.  Integrating Cognitive and Psychometric Models to Measure Document Literacy. , 1990 .

[20]  Brian K. Lynch,et al.  Investigating variability in tasks and rater judgements in a performance test of foreign language speaking , 1995 .

[21]  J. Richards Listening Comprehension: Approach, Design, Procedure , 1983 .

[22]  J. Albert Bayesian Estimation of Normal Ogive Item Response Curves Using Gibbs Sampling , 1992 .

[23]  Lyle F. Bachman,et al.  语言测试实践 = Language testing in practice , 1998 .

[24]  Linda S. Steinberg,et al.  Intelligent tutoring and assessment built on an understanding of a technical problem-solving task , 1996 .

[25]  Stellan Ohlsson,et al.  Some principles of intelligent tutoring , 1986 .

[26]  Barbara G. Dodd,et al.  Computerized Adaptive Testing With Polytomous Items , 1995 .

[27]  Paul J. Feltovich,et al.  Categorization and Representation of Physics Problems by Experts and Novices , 1981, Cogn. Sci..

[28]  Daniel O. Segall,et al.  Multidimensional adaptive testing , 1996 .

[29]  Eric T. Bradlow,et al.  A Bayesian random effects model for testlets , 1999 .

[30]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[31]  Willem J. van der Linden,et al.  Multidimensional Adaptive Testing with a Minimum Error-Variance Criterion , 1999 .

[32]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[33]  Susan Nissan An Analysis of Factors Affecting the Difficulty of Dialogue Items in TOEFL Listening Comprehension. TOEFL Research Reports, 51. , 1996 .

[34]  David Thissen,et al.  A taxonomy of item response models , 1986 .

[35]  D. J. Spiegelhalter,et al.  Statistical and Knowledge‐Based Approaches to Clinical Decision‐Support Systems, with an Application in Gastroenterology , 1984 .

[36]  R. Owen,et al.  A Bayesian Sequential Procedure for Quantal Response in the Context of Adaptive Mental Testing , 1975 .

[37]  R. Hambleton Principles and selected applications of item response theory. , 1989 .

[38]  T. McNamara Measuring Second Language Performance , 1996 .

[39]  Issac I. Bejar A Generative Analysis of a Three-Dimensional Spatial Task , 1990 .

[40]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[41]  Willem J. van der Linden,et al.  Optimal Assembly of Psychological and Educational Tests , 1998 .

[42]  Martha L. Stocking,et al.  A Method for Severely Constrained Item Selection in Adaptive Testing , 1992 .

[43]  Robert J. Mislevy,et al.  Test Theory for A New Generation of Tests , 1994 .

[44]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[45]  Patrick Tapsfield,et al.  The British Army Recruit Battery Goes Operational: From Theory to Practice in Computer‐Based Testing Using Item‐Generation Techniques , 1995 .

[46]  Irwin S. Kirsch,et al.  Toward an explanatory model of document literacy , 1991 .

[47]  Bert F. Green,et al.  In defense of measurement. , 1978 .

[48]  Howard Wainer,et al.  Computerized Adaptive Testing: A Primer , 2000 .

[49]  William Grabe,et al.  Communicative language proficiency : definition and implications for TOEFL 2000 , 1997 .

[50]  Carl P. M. Rijkes,et al.  Loglinear multidimensional IRT models for polytomously scored items , 1988 .

[51]  Donald B. Rubin,et al.  Measuring the Appropriateness of Multiple-Choice Test Scores , 1979 .

[52]  R. Mislevy Exploiting Auxiliary Information About Items in the Estimation of Rasch Item Difficulty Parameters , 1987 .

[53]  R. Linn Educational measurement, 3rd ed. , 1989 .

[54]  Susan Nissan,et al.  AN ANALYSIS OF FACTORS AFFECTING THE DIFFICULTY OF DIALOGUE ITEMS IN TOEFL LISTENING COMPREHENSION , 1995 .

[55]  Robert J. Mislevy,et al.  How to Equate Tests With Little or No Data , 1993 .