The benefits of sequential testing: Improved diagnostic accuracy and better outcomes for failing students

Abstract Introduction: In recent decades, there has been a move towards standardized models of assessment where all students sit the same test (e.g. OSCE). By contrast, in a sequential test the examination is in two parts, a “screening” test (S1) that all candidates take, and then a second “test” (S2) which only the weaker candidates sit. This article investigates the diagnostic accuracy of this assessment design, and investigates failing students’ subsequent performance under this model. Methods: Using recent undergraduate knowledge and performance data, we compare S1 “decisions” to S2 overall pass/fail decisions to assess diagnostic accuracy in a sequential model. We also evaluate the longitudinal performance of failing students using changes in percentile ranks over a full repeated year. Findings: We find a small but important improvement in diagnostic accuracy under a sequential model (of the order 2–4% of students misclassified under a traditional model). Further, after a resit year, weaker students’ rankings relative to their peers improve by 20–30 percentile points. Discussion: These findings provide strong empirical support for the theoretical arguments in favor of a sequential testing model of assessment, particularly that diagnostic accuracy and longitudinal assessment outcomes post-remediation for the weakest students are both improved.

[1]  D. Eignor The standards for educational and psychological testing. , 2013 .

[2]  M. Homer,et al.  Investigating disparity between global grades and checklist scores in OSCEs , 2015, Medical teacher.

[3]  Paula T. Ross,et al.  Remediating Students’ Failed OSCE Performances at One School: The Effects of Self-Assessment, Reflection, and Feedback , 2009, Academic medicine : journal of the Association of American Medical Colleges.

[4]  C. Vleuten,et al.  ROC and Loss Function Analysis in Sequential Testing , 2006, Advances in health sciences education : theory and practice.

[5]  S. Durning,et al.  What programmatic assessment in medical education can learn from healthcare , 2017, Perspectives on Medical Education.

[6]  I. McManus The misinterpretation of the standard error of measurement in medical education: A primer on the problems, pitfalls and peculiarities of the three different standard errors of measurement , 2012, Medical teacher.

[7]  Sara Mortaz Hejri,et al.  What is an optimal sequential OSCE model? , 2016, Medical teacher.

[8]  C. V. D. van der Vleuten,et al.  Programmatic assessment and Kane’s validity perspective , 2012, Medical education.

[9]  C. Dweck,et al.  Mindsets That Promote Resilience: When Students Believe That Personal Characteristics Can Be Developed , 2012 .

[10]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[11]  E. Skakun,et al.  COMPARABILITY OF METHODS FOR SETTING STANDARDS , 1980 .

[12]  W. Revelle,et al.  Coefficients Alpha, Beta, Omega, and the glb: Comments on Sijtsma , 2009 .

[13]  J. Norcini,et al.  How to set standards on performance-based examinations: AMEE Guide No. 85 , 2014, Medical teacher.

[14]  Cees P. M. van der Vleuten,et al.  Programmatic assessment and Kane's validity perspective. , 2012 .

[15]  Cees P. M. van der Vleuten,et al.  Sequential Testing in the Assessment of Clinical Skills , 2000, Academic medicine : journal of the Association of American Medical Colleges.

[16]  Kieran Walsh Sequential testing: costs and cost savings may be greater , 2011, Medical education.

[17]  Howard Wainer,et al.  For want of a nail: Why unnecessarily long tests may be impeding the progress of Western civilisation , 2015 .

[18]  D. Altman,et al.  Statistics Notes: Some examples of regression towards the mean , 1994 .

[19]  Matthew S. Prewett,et al.  A systematic review of the reliability of objective structured clinical examination scores , 2011, Medical education.

[20]  Jean McKendree,et al.  A final clinical examination using a sequential design to improve cost‐effectiveness , 2011, Medical education.

[21]  Gergory J. Cizek,et al.  Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests , 2006 .

[22]  Lars Lefgren,et al.  The Effect of Grade Retention on High School Completion , 2007 .

[23]  Short-term gain at long-term cost? How resit policy can affect student learning , 2012 .

[24]  Chris Ricketts,et al.  A new look at resits: are they simply a second chance? , 2010 .

[25]  M. Homer,et al.  Longitudinal interrelationships of OSCE station level analyses, quality improvement and overall reliability , 2013, Medical teacher.

[26]  D. Boud,et al.  Sustainable assessment revisited , 2016 .

[27]  R. Harden,et al.  ASPIRE: international recognition of excellence in medical education , 2015, The Lancet.

[28]  M. Julião,et al.  Is the OSCE a feasible tool to assess competencies in undergraduate medical education? , 2013, Medical teacher.

[29]  J. Dijkstra,et al.  A new framework for designing programmes of assessment , 2009, Advances in health sciences education : theory and practice.

[30]  Trudie Roberts,et al.  Is short-term remediation after OSCE failure sustained? A retrospective analysis of the longitudinal attainment of underperforming students in OSCE assessments , 2012, Medical teacher.

[31]  J. Cleland,et al.  Sequential testing in a high stakes OSCE: Determining number of screening tests , 2016, Medical teacher.

[32]  Arthur I. Rothman,et al.  A Sequenced OSCE for Licensure: Administrative Issues, Results and Myths , 2003, Advances in health sciences education : theory and practice.

[33]  I. Arnold Resitting or compensating a failed examination: does it affect subsequent results? , 2017 .

[34]  Stephen Senn,et al.  Francis Galton and regression to the mean , 2011 .

[35]  John Sandars,et al.  The remediation challenge: theoretical and methodological insights from a systematic review , 2013, Medical education.

[36]  Godfrey Pell,et al.  The trouble with resits … , 2009 .

[37]  Richard Fuller,et al.  How to measure the quality of the OSCE: A review of metrics – AMEE guide no. 49 , 2010, Medical teacher.

[38]  Cees van der Vleuten,et al.  Comparison of a rational and an empirical standard setting procedure for an OSCE , 2003, Medical education.

[39]  Richard Hays,et al.  The practical value of the standard error of measurement in borderline pass/fail decisions , 2008, Medical education.

[40]  Petra Thiemann,et al.  Doing It Twice, Getting It Right? The Effects of Grade Retention and Course Repetition in Higher Education , 2015 .

[41]  J. Darling,et al.  Setting standards in knowledge assessments: Comparing Ebel and Cohen via Rasch , 2016, Medical teacher.

[42]  Jennifer Cleland,et al.  Sequential objective structured clinical examinations: Number of stations , 2016, Medical teacher.

[43]  Sara Mortaz Hejri,et al.  Introducing a model for optimal design of sequential objective structured clinical examinations , 2016, Advances in health sciences education : theory and practice.

[44]  Adrian G Barnett,et al.  Regression to the mean: what it is and how to deal with it. , 2004, International journal of epidemiology.

[45]  N. McNaughton,et al.  The show must go on? Patients, props and pedagogy in the theatre of the OSCE , 2016, Medical education.

[46]  Richard Fuller,et al.  Advancing the objective structured clinical examination: sequential testing in theory and practice , 2013, Medical education.