Classification Accuracy of Mixed Format Tests: A Bi-Factor Item Response Theory Approach

Mixed format tests (e.g., a test consisting of multiple-choice [MC] items and constructed response [CR] items) have become increasingly popular. However, the latent structure of item pools consisting of the two formats is still equivocal. Moreover, the implications of this latent structure are unclear: For example, do constructed response items tap reasoning skills that cannot be assessed with multiple choice items? This study explored the dimensionality of mixed format tests by applying bi-factor models to 10 tests of various subjects from the College Board's Advanced Placement (AP) Program and compared the accuracy of scores based on the bi-factor analysis with scores derived from a unidimensional analysis. More importantly, this study focused on a practical and important question—classification accuracy of the overall grade on a mixed format test. Our findings revealed that the degree of multidimensionality resulting from the mixed item format varied from subject to subject, depending on the disattenuated correlation between scores from MC and CR subtests. Moreover, remarkably small decrements in classification accuracy were found for the unidimensional analysis when the disattenuated correlations exceeded 0.90.

[1]  Christine E. DeMars Application of the Bi-Factor Multidimensional Item Response Theory Model to Testlet-Based Tests. , 2006 .

[2]  J. Gustafsson,et al.  General and Specific Abilities as Predictors of School Achievement. , 1993, Multivariate behavioral research.

[3]  K. Holzinger,et al.  The Bi-factor method , 1937 .

[4]  Robert J. Townsley,et al.  What is a Good? , 1999 .

[5]  Robert Lukhele,et al.  On the Relative Value of Multiple-Choice, Constructed Response, and Examinee-Selected Items on Two Achievement Tests. Program Statistics Research Technical Report No. 93-28. , 1993 .

[6]  A. Beauducel,et al.  The PANAS structure revisited: on the validity of a bifactor model in community and forensic samples. , 2011, Psychological assessment.

[7]  W. Michael Conklin,et al.  Monte Carlo Methods in Bayesian Computation , 2001, Technometrics.

[8]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[9]  Gautam Puhan,et al.  Reporting subscores for institutions. , 2009, The British journal of mathematical and statistical psychology.

[10]  Randy Elliot Bennett,et al.  Equivalence of Free-Response and Multiple-Choice Items , 1991 .

[11]  David Watson,et al.  Parsing the general and specific components of depression and anxiety with bifactor modeling , 2008, Depression and anxiety.

[12]  Mark G. Simkin,et al.  How Well Do Multiple Choice Tests Evaluate Student Understanding in Computer Programming Classes? , 2003, J. Inf. Syst. Educ..

[13]  Gautam Puhan,et al.  Reporting Diagnostic Scores in Educational Testing: Temptations, Pitfalls, and Some Solutions , 2010, Multivariate behavioral research.

[14]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[15]  A. Béguin,et al.  MCMC estimation and some model-fit analysis of multidimensional IRT models , 2001 .

[16]  Shelby J. Haberman,et al.  When Can Subscores Have Value? , 2008 .

[17]  Howard Wainer,et al.  Testlet Response Theory: An Analog for the 3PL Model Useful in Testlet-Based Adaptive Testing , 2000 .

[18]  Lawrence T. DeCarlo,et al.  A Model of Rater Behavior in Essay Grading Based on Signal Detection Theory , 2005 .

[19]  F. Swineford A study in factor analysis : the nature of the general, verbal, and spatial bi-factors , 1948 .

[20]  L. Humphreys An analysis and evaluation of test and item bias in the prediction context. , 1986 .

[21]  Peter Behuniak,et al.  Item Function Characteristics and Dimensionality for Alternative Response Formats in Mathematics , 1996 .

[22]  Linda Holmes Multiple-Choice Exams , 2007 .

[23]  Donald Hedeker,et al.  Full-information item bi-factor analysis , 1992 .

[24]  Li Cai,et al.  Generalized full-information item bifactor analysis. , 2011, Psychological methods.

[25]  Daniel M. Bolt,et al.  Estimation of Compensatory and Noncompensatory Multidimensional Item Response Models Using Markov Chain Monte Carlo , 2003 .

[26]  R. C. Durfee,et al.  MULTIPLE FACTOR ANALYSIS. , 1967 .

[27]  Peter M. Bentler,et al.  Exploratory Bi-Factor Analysis , 2011, Psychometrika.

[28]  W. Becker,et al.  The Relationship between Multiple Choice and Essay Response Questions in Assessing Economics Understanding , 1999 .

[29]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[30]  Mark G. Simkin,et al.  Why Is Performance on Multiple‐Choice Tests and Constructed‐Response Tests Not More Closely Related? Theory and an Empirical Test* , 2010 .

[31]  Michael C. Pyryt Human cognitive abilities: A survey of factor analytic studies , 1998 .

[32]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[33]  Mark G. Simkin,et al.  Using Multiple-choice Tests to Evaluate Students' Understanding of Accounting , 2008 .

[34]  Michael C. Edwards,et al.  A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis , 2010 .

[35]  A. Jensen,et al.  What is a good g , 1994 .

[36]  L. T. DeCarlo Signal detection theory with finite mixture distributions: theoretical developments with applications to recognition memory. , 2002, Psychological review.

[37]  Sooyeon Kim,et al.  Investigating the Effectiveness of Equating Designs for Constructed‐Response Tests in Large‐Scale Assessments , 2010 .

[38]  Kadriye Ercikan,et al.  Calibration and Scoring of Tests With Multiple-Choice and Constructed-Response Item Types , 1998 .

[39]  M. Reckase Unifactor Latent Trait Models Applied to Multifactor Tests: Results and Implications , 1979 .

[40]  Peter E. Kennedy,et al.  Are Multiple-Choice Exams Easier for Economics Students? A Comparison of Multiple-Choice and “Equivalent” Constructed-Response Exam Questions , 2002 .

[41]  Youngkoung Kim Combining constructed response items and multiple choice items using a hierarchical rater model , 2009 .

[42]  Robert D Gibbons,et al.  On the psychometric validity of the domains of the PDSQ: an illustration of the bi-factor item response theory model. , 2009, Journal of psychiatric research.

[43]  H. Wainer,et al.  Are Tests Comprising Both Multiple‐Choice and Free‐Response Items Necessarily Less Unidimensional Than Multiple‐Choice Tests?An Analysis of Two Tests , 1994 .

[44]  Raymond J. Adams,et al.  The Multidimensional Random Coefficients Multinomial Logit Model , 1997 .

[45]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[46]  J. Laurenceau,et al.  Modeling general and specific variance in multifaceted constructs: a comparison of the bifactor model to other approaches. , 2012, Journal of personality.

[47]  Jian Qing Shi,et al.  Bayesian sampling‐based approach for factor analysis models with continuous and polytomous data , 1998 .

[48]  Steven P. Reise,et al.  The role of the bifactor model in resolving dimensionality issues in health outcomes measures , 2007, Quality of Life Research.

[49]  Frank Rijmen,et al.  Formal Relations and an Empirical Comparison among the Bi‐Factor, the Testlet, and a Second‐Order Multidimensional IRT Model , 2010 .

[50]  Howard Wainer,et al.  ON THE RELATIVE VALUE OF MULTIPLE‐CHOICE, CONSTRUCTED‐RESPONSE, AND EXAMINEE‐SELECTED ITEMS ON TWO ACHIEVEMENT TESTS1 , 1993 .

[51]  Shelby J. Haberman,et al.  Reporting of Subscores Using Multidimensional Item Response Theory , 2010 .