If at First You Don’t Succeed, Try, Try Again

Previous research has considered sequential item response theory (SIRT) models for circumstances where examinees are allowed at least one opportunity to correctly answer questions. Research suggests that employing answer-until-correct assessment frameworks with partial feedback can promote student learning and improve score precision. This article describes SIRT models for cases when test takers are allowed a finite number of repeated attempts on items. An overview of SIRT models is provided and the Rasch SIRT is discussed as a special case. Three applications are presented using assessment data from a calculus-based probability theory course. The first application estimates a Rasch SIRT model using marginal maximum likelihood and Markov chain Monte Carlo procedures and students with higher latent variable scores tend to have more knowledge and are better able to retrieve that knowledge in fewer attempts. The second application uses R to estimate growth-curve SIRT models that account for individual differences in content knowledge and recovery/retrieval rates. The third application is a multidimensional SIRT model that estimates an attempt-specific latent proficiency variable. The implications of SIRT models and answer-until-correct assessment frameworks are discussed for researchers, psychometricians, and test developers.

[1]  Robert B. Frary,et al.  Partial-Credit Scoring Methods for Multiple-Choice Tests , 1989 .

[2]  José Muñiz,et al.  The Answer-Until-Correct Item Format Revisited , 2011 .

[3]  R. Wilcox A Closed Sequential Procedure for Answer-Until-Correct Tests. , 1982 .

[4]  Paul De Boeck,et al.  IRTrees: Tree-Based Item Response Models of the GLMM Family , 2012 .

[5]  Edward Haksing Ip,et al.  Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. , 2010, The British journal of mathematical and statistical psychology.

[6]  F. Tuerlinckx,et al.  Distinguishing Constant and Dimension-Dependent Interaction: A Simulation Study , 1999 .

[7]  Cornelis A.W. Glas,et al.  A Person Fit Test For Irt Models For Polytomous Items , 2007 .

[8]  D. J. Bartholomew,et al.  The Sampling Distribution of an Estimate Arising in Life Testing , 1963 .

[9]  Edward H. Ip,et al.  Locally dependent latent trait model and the dutch identity revisited , 2002 .

[10]  R. Wilcox SOME NEW RESULTS ON AN ANSWER‐UNTIL‐CORRECT SCORING PROCEDURE , 1982 .

[11]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[12]  Donald E. Powers,et al.  Immediate Feedback and Opportunity to Revise Answers to Open-Ended Questions , 2010 .

[13]  G. Tutz Sequential item response models with an ordered response , 1990 .

[14]  C. Glas,et al.  Testing Linear Models for Ability Parameters in Item Response Models , 2005, Multivariate behavioral research.

[15]  Tom Verguts,et al.  A Rasch Model for Detecting Learning While Solving an Intelligence Test , 2000 .

[16]  G. S. Hanna INCREMENTAL RELIABILITY AND VALIDITY OF MULTIPLE-CHOICE TESTS WITH AN ANSWER-UNTIL-CORRECT PROCEDURE1 , 1975 .

[17]  Janet Metcalfe,et al.  Making related errors facilitates learning, but learners do not know it , 2012, Memory & cognition.

[18]  John Balch The Influence of the Evaluating Instrument on Students’ Learning , 1964 .

[19]  Cornelis A.W. Glas,et al.  Modeling Nonignorable Missing Data in Speeded Tests , 2008 .

[20]  Rand R. Wilcox,et al.  Solving Measurement Problems with an Answer-Until-Correct Scoring Procedure , 1981 .

[21]  A. Kluger,et al.  The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. , 1996 .

[22]  S. Culpepper The Reliability and Precision of Total Scores and IRT Estimates as a Function of Polytomous IRT Parameters and Latent Trait Distribution , 2013 .

[23]  G. S. Hanna A STUDY OF RELIABILITY AND VALIDITY EFFECTS OF TOTAL AND PARTIAL IMMEDIATE FEEDBACK IN MULTIPLE‐CHOICE TESTING , 1977 .

[24]  W. Akkermans Polytomous item scores and Guttman dependence , 1999 .

[25]  J H Albert,et al.  Sequential Ordinal Modeling with Applications to Survival Data , 2001, Biometrics.

[26]  Yigal Attali,et al.  Immediate Feedback and Opportunity to Revise Answers , 2011 .

[27]  G. S. Hanna Effects of Total and Partial Feedback in Multiple-Choice Testing Upon Learning , 1976 .

[28]  A note on the equivalence of the graded response model and the sequential model , 2001 .

[29]  Effects of violating local independence on IRT parameter estimation for the Binomial Trials Model. , 1992, Research quarterly for exercise and sport.

[30]  Cees A. W. Glas,et al.  Application of Multidimensional Item Response Theory Models to Longitudinal Data , 2006 .

[31]  U. Böckenholt Modeling multiple response processes in judgment and choice. , 2012, Psychological methods.

[32]  G. Brosvic,et al.  Immediate Feedback during Academic Testing , 2001, Psychological reports.

[33]  J A Spray One-parameter item response theory models for psychomotor tests involving repeated, independent attempts. , 1990, Research quarterly for exercise and sport.

[34]  J. B. Stroud,et al.  Effect of Informing Pupils of the Correctness of Their Responses to Objective Test Questions , 1942 .

[35]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[36]  Jeffrey D. Karpicke,et al.  When and why do retrieval attempts enhance subsequent encoding? , 2012, Memory & cognition.

[37]  Url PubMed Retrieval Practice Produces More Learning than Elaborative Studying with Concept Mapping , 2012 .

[38]  David V. Budescu,et al.  A Comparative Study of Measures of Partial Knowledge in Multiple-Choice Tests , 1997 .

[39]  L. Gosse,et al.  Test Anxiety and the Immediate Feedback Assessment Technique , 2006 .

[40]  Wim J. van der Linden,et al.  Modeling Answer Changes on Test Items , 2012 .

[41]  D. N.,et al.  DYNAMIC GENERALIZATION OF THE RASCH MODEL , 2000 .

[42]  R. Frary The Effect of Misinformation, Partial Information, and Guessing on Expected Multiple-Choice Test Item Scores , 1980 .

[43]  M. Kane,et al.  The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring , 1978 .

[44]  Jeffrey D. Karpicke,et al.  Retrieval-Based Learning: A Perspective for Enhancing Meaningful Learning , 2012 .

[45]  U. Böckenholt The Cognitive-Miser Response Model: Testing for Intuitive and Deliberate Reasoning , 2012 .

[46]  J. Prestwood Knowledge of Results and the Proportion of Positive Feedback on Tests of Ability , 1979 .

[47]  M. Looney,et al.  The task difficulty of free throw shooting for males and females. , 1996, Research quarterly for exercise and sport.

[48]  Paul De Boeck,et al.  A parametric model for local dependence among test items. , 1997 .

[49]  W Akkermans Modelling sequentially scored item responses. , 2000, The British journal of mathematical and statistical psychology.

[50]  Klaas Sijtsma,et al.  On measurement properties of continuation ratio models , 2001 .

[51]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[52]  Gerhard H. Fischer,et al.  Some neglected problems in IRT , 1995 .