Generalizability of Competency Assessment Scores Across and Within Clerkships: How Students, Assessors, and Clerkships Matter

Purpose Many factors influence the reliable assessment of medical students’ competencies in the clerkships. The purpose of this study was to determine how many clerkship competency assessment scores were necessary to achieve an acceptable threshold of reliability. Method Clerkship student assessment data were collected during the 2015–2016 academic year as part of the medical school assessment program at the University of Michigan Medical School. Faculty and residents assigned competency assessment scores for third-year core clerkship students. Generalizability (G) and decision (D) studies were conducted using balanced, stratified, and random samples to examine the extent to which overall assessment scores could reliably differentiate between students’ competency levels both within and across clerkships. Results In the across-clerkship model, the residual error accounted for the largest proportion of variance (75%), whereas the variance attributed to the student and student–clerkship effects was much smaller (7% and 10.1%, respectively). D studies indicated that generalizability estimates for eight assessors within a clerkship varied across clerkships (G coefficients range = 0.000–0.795). Within clerkships, the number of assessors needed for optimal reliability varied from 4 to 17. Conclusions Minimal reliability was found in competency assessment scores for half of clerkships. The variability in reliability estimates across clerkships may be attributable to differences in scoring processes and assessor training. Other medical schools face similar variation in assessments of clerkship students; therefore, the authors hope this study will serve as a model for other institutions that wish to examine the reliability of their clerkship assessment scores.

[1]  J. Martin,et al.  Factors Influencing Mini-CEX Rater Judgments and Their Practical Implications: A Systematic Literature Review , 2017, Academic medicine : journal of the Association of American Medical Colleges.

[2]  Sangshin Park,et al.  Clinical Performance Evaluations of Third-Year Medical Students and Association With Student and Evaluator Gender , 2017, Academic medicine : journal of the Association of American Medical Colleges.

[3]  M. Davidson,et al.  Competency milestones for medical students: Design, implementation, and analysis at one medical school , 2017, Medical teacher.

[4]  J. Lipman,et al.  Defining Honors in the Surgery Clerkship. , 2016, Journal of the American College of Surgeons.

[5]  M. Schiff,et al.  Beyond the Ivory Tower: A Comparison of Grades Across Academic and Community OB/GYN Clerkship Sites , 2016, Teaching and learning in medicine.

[6]  Adam B. Wilson,et al.  Examining rater and occasion influences in observational assessments obtained from within the clinical environment , 2016, Medical education online.

[7]  Adam B. Wilson,et al.  Examining rater and occasion influences in observational assessments obtained from within the clinical environment , 2016, Medical education online.

[8]  Angela D. Blood,et al.  Neurology objective structured clinical examination reliability using generalizability theory , 2015, Neurology.

[9]  C. V. D. van der Vleuten,et al.  The Reliability of Multisource Feedback in Competency-Based Assessment Programs: The Effects of Multiple Occasions and Assessor Groups , 2015, Academic medicine : journal of the Association of American Medical Colleges.

[10]  C P M Van Der Vleuten,et al.  Twelve Tips for programmatic assessment , 2015, Medical teacher.

[11]  J. Mandel,et al.  Evaluating a grading change at UCSD school of medicine: pass/fail grading is associated with decreased performance on preclinical exams but unchanged performance on USMLE step 1 scores , 2014, BMC medical education.

[12]  J. Hanson,et al.  Narrative descriptions should replace grades and numerical ratings for clinical performance in medical education in the United States , 2013, Front. Psychol..

[13]  Karen Mann,et al.  Seeing the same thing differently , 2013, Advances in health sciences education : theory and practice.

[14]  Christy K Boscardin,et al.  A Comparison of Two Standard-Setting Approaches in High-Stakes Clinical Performance Assessment Using Generalizability Theory , 2012, Academic medicine : journal of the Association of American Medical Colleges.

[15]  E. Alexander,et al.  Variation and Imprecision of Clerkship Grading in U.S. Medical Schools , 2012, Academic medicine : journal of the Association of American Medical Colleges.

[16]  T. M. Moore Simas,et al.  Impact of pass/fail grading on medical students’ well‐being and academic outcomes , 2011, Medical education.

[17]  A. Muijtjens,et al.  Workplace-based assessment: effects of rater expertise , 2010, Advances in health sciences education : theory and practice.

[18]  Jonathan Sherbino,et al.  The role of assessment in competency-based medical education , 2010, Medical teacher.

[19]  M. Donnelly,et al.  Variation in faculty evaluations of clerkship students attributable to surgical service. , 2010, Journal of surgical education.

[20]  F. Al-Mahroos Construct Validity and Generalizability of Pediatrics Clerkship Evaluation at a Problem-Based Medical School, Bahrain , 2009, Evaluation & the health professions.

[21]  Marianne M Green,et al.  Selection Criteria for Residency: Results of a National Program Directors Survey , 2009, Academic medicine : journal of the Association of American Medical Colleges.

[22]  Paul F. Wimmers,et al.  Is clinical competence perceived differently for student daily performance on the wards versus clerkship grading? , 2008, Advances in health sciences education : theory and practice.

[23]  Christopher Mushquash,et al.  SPSS and SAS programs for generalizability theory analyses , 2006, Behavior research methods.

[24]  L. Schuwirth,et al.  A plea for new psychometric models in educational assessment , 2006, Medical education.

[25]  Brian Jolly,et al.  Generalisability: a key to unlock professional assessment , 2002, Medical education.

[26]  C. Kreiter,et al.  Examining the Generalizability of Ratings across Clerkships Using a Clinical Evaluation Form , 2001, Evaluation & the health professions.

[27]  L. Pangaro,et al.  A new vocabulary and other innovations for improving descriptive in-training evaluations. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[28]  R. Brennan,et al.  A generalizability study of a new standardized rating form used to evaluate students' clinical clerkship performances , 1998, Academic medicine : journal of the Association of American Medical Colleges.

[29]  J. Carline,et al.  Use of peer ratings to evaluate physician performance. , 1993, JAMA.

[30]  Paul G. Ramsey,et al.  Use of Peer Ratings to Evaluate Physician Performance , 1993 .

[31]  G. Boodoo,et al.  Assessing Pediatric Clerkship Evaluations Using Generalizability Theory , 1986, Evaluation & the health professions.