Rating and ranking preparedness characteristics important for veterinary workplace clinical training: a novel application of pairwise comparisons and the Elo algorithm

Quantitatively eliciting perspectives about a large number of similar entities (such as a list of competences) is a challenge for researchers in health professions education (HPE). Traditional survey methods may include using Likert items. However, a Likert item approach that generates absolute ratings of the entities may suffer from the “ceiling effect,” as ratings cluster at one end of the scale. This impacts on researchers’ ability to detect differences in ratings between the entities themselves and between respondent groups. This paper describes the use of pairwise comparison (this or that?) questions and a novel application of the Elo algorithm to generate relative ratings and rankings of a large number of entities, on a unidimensional scale. A study assessing the relative importance of 91 student “preparedness characteristics” for veterinary workplace clinical training (WCT) is presented as an example of this method in action. The Elo algorithm uses pairwise comparison responses to generate an importance rating for each preparedness characteristic on a scale from zero to one. This is continuous data with measurement variability which, by definition, spans an entire spectrum and is not susceptible to the ceiling effect. The output should allow for the detection of differences in perspectives between groups of survey respondents (such as students and workplace supervisors) which Likert ratings may be insensitive to. Additional advantages of the pairwise comparisons are their low susceptibility to systematic bias and measurement error, they can be quicker and arguably more engaging to complete than Likert items, and they should carry a low cognitive load for respondents. Methods for evaluating the validity and reliability of this survey design are also described. This paper presents a method that holds great potential for a diverse range of applications in HPE research. In the pursuit quantifying perspectives on survey items which are measured on a relative basis and a unidimensional scale (e.g., importance, priority, probability), this method is likely to be a valuable option.

[1]  A. Mandrusiak,et al.  Evaluating allied health students’ readiness for placement learning , 2023, BMC Medical Education.

[2]  K. Jeevaratnam,et al.  Stakeholder perspectives on veterinary student preparedness for workplace clinical training – a qualitative study , 2022, BMC Veterinary Research.

[3]  Stephen Lindsay,et al.  Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment , 2022, ICEMT.

[4]  Tony Belpaeme,et al.  To Rate or Not To Rate: Investigating Evaluation Methods for Generated Co-Speech Gestures , 2021, ICMI.

[5]  K. Jeevaratnam,et al.  Student perspectives of preparedness characteristics for clinical learning within a fully distributed veterinary teaching model , 2021, PloS one.

[6]  Jan Karem Höhne,et al.  How long do respondents think online surveys should be? New evidence from two online panels in Germany , 2020, International Journal of Market Research.

[7]  S. Krüger,et al.  Judging books by their covers – Tinder interface, usage and sociocultural implications , 2020, Information, Communication & Society.

[8]  G. Rhodes,et al.  Best-worst scaling improves measurement of first impressions , 2019, Cognitive Research: Principles and Implications.

[9]  Yuchun Zhou,et al.  A Mixed Methods Model of Scale Development and Validation Analysis , 2019 .

[10]  Christa Boer,et al.  Correlation Coefficients: Appropriate Use and Interpretation , 2018, Anesthesia and analgesia.

[11]  Kate L. Howard,et al.  Why rate when you could compare? Using the “EloChoice” package to assess pairwise comparisons of perceived physical strength , 2018, PloS one.

[12]  S. Durning,et al.  “The Questions Shape the Answers”: Assessing the Quality of Published Survey Instruments in Health Professions Education Research , 2017, Academic medicine : journal of the Association of American Medical Colleges.

[13]  Konrad Kulakowski,et al.  Inconsistency in the ordinal pairwise comparisons method with and without ties , 2017, Eur. J. Oper. Res..

[14]  Bruce B. Frey,et al.  The Sage encyclopedia of educational research, measurement, and evaluation , 2018 .

[15]  K. Jeevaratnam,et al.  Student preparedness characteristics important for clinical learning: perspectives of supervisors from medicine, pharmacy and nursing , 2017, BMC Medical Education.

[16]  N. Newton-Fisher Modeling Social Dominance: Elo-Ratings, Prior History, and the Intensity of Aggression , 2017, International Journal of Primatology.

[17]  Melanie Revilla,et al.  Ideal and Maximum Length for a Web Survey , 2017, International Journal of Market Research.

[18]  A. Ismail,et al.  Identifying Noncognitive Skills That Contribute to Dental Students' Success: Dental Faculty Perspectives. , 2017, Journal of dental education.

[19]  S. Reddy,et al.  Surveys of Health Professions Trainees: Prevalence, Response Rates, and Predictive Factors to Guide Researchers , 2017, Academic medicine : journal of the Association of American Medical Colleges.

[20]  Bridget C. O’Brien,et al.  Shedding the cobra effect: problematising thematic emergence, triangulation, saturation and member checking , 2017, Medical education.

[21]  Robert Goodspeed,et al.  Research note: An evaluation of the Elo algorithm for pairwise visual assessment surveys , 2017 .

[22]  Radek Pelanek,et al.  Applications of the Elo rating system in adaptive educational systems , 2016, Comput. Educ..

[23]  M. Dozier,et al.  Which professional (non-technical) competencies are most important to the success of graduate veterinarians? A Best Evidence Medical Education (BEME) systematic review: BEME Guide No. 38 , 2016, Medical teacher.

[24]  Ari Voutilainen,et al.  How to ask about patient satisfaction? The visual analogue scale is less vulnerable to confounding factors and ceiling effect than a symmetric Likert scale. , 2016, Journal of advanced nursing.

[25]  D. Polit-O'hara,et al.  Measurement and the measurement of change : a primer for the health professions , 2016 .

[26]  John W. Creswell,et al.  Integrating Quantitative and Qualitative Results in Health Science Mixed Methods Research Through Joint Displays , 2015, The Annals of Family Medicine.

[27]  Andrew S Phelps,et al.  Pairwise comparison versus Likert scale for biomedical image assessment. , 2015, AJR. American journal of roentgenology.

[28]  Michael Vitale,et al.  The wisdom of crowds , 2016, The Lancet.

[29]  V. Ridgway,et al.  Are we preparing student nurses for final practice placement? , 2014, British journal of nursing.

[30]  Hunter Gehlbach,et al.  Developing questionnaires for educational research: AMEE Guide No. 87 , 2014, Medical teacher.

[31]  Peter J. Buttrum,et al.  Characteristics of student preparedness for clinical learning: clinical educator perspectives using the Delphi approach , 2012, BMC medical education.

[32]  David J. Hand,et al.  Who's #1? The science of rating and ranking , 2012 .

[33]  C. D. Meyer,et al.  Who's #1?: The Science of Rating and Ranking , 2012 .

[34]  Jiawei Han,et al.  3 – Data Preprocessing , 2012 .

[35]  Russell K. Schutt,et al.  Research Methods in Education , 2011 .

[36]  Patricia M Régo,et al.  Rewards, costs and challenges: the general practitioner’s experience of teaching medical students , 2011, Medical education.

[37]  C. Terwee,et al.  The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. , 2010, Journal of clinical epidemiology.

[38]  Rocco J. Perla,et al.  Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes , 2007 .

[39]  Georgia Spiliotopoulou Preparing Occupational Therapy Students for Practice Placements: Initial Evidence , 2007 .

[40]  Bernd Marcus,et al.  Compensating for Low Topic Interest and Long Surveys , 2007 .

[41]  Cheryl Tatano Beck,et al.  Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. , 2007, Research in nursing & health.

[42]  L. Buttazzoni,et al.  On the use of elo rating on harness racing results in the genetic evaluation of trotter , 2007 .

[43]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[44]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[45]  A. Starnes,et al.  Statistical Models Applied to the Rating of Sports Teams , 2005 .

[46]  S. Jamieson Likert scales: how to (ab)use them , 2004, Medical education.

[47]  Martin Wetzels,et al.  Response Rate and Response Quality of Internet-Based Surveys: An Experimental Study , 2004 .

[48]  M. Bohanec,et al.  The Analytic Hierarchy Process , 2004 .

[49]  J. Drennan,et al.  Cognitive interviewing: verbal data in the design and pretesting of questionnaires. , 2003, Journal of advanced nursing.

[50]  M. Larsen,et al.  The Psychology of Survey Response , 2002 .

[51]  M. Couper,et al.  Web Surveys , 2001 .

[52]  L. Grealish,et al.  Students in transit: using a self-directed preceptorship package to smooth the journey. , 2001, Journal of Nursing Education.

[53]  I. Dey Grounding grounded theory : guidelines for qualitative inquiry , 1999 .

[54]  L. Thurstone A law of comparative judgment. , 1994 .

[55]  Jon A. Krosnick,et al.  Comparisons of Party Identification and Policy Preferences: The Impact of Survey Question Format , 1993 .

[56]  James P. Keener,et al.  The Perron-Frobenius Theorem and the Ranking of Football Teams , 1993, SIAM Rev..

[57]  Norbert Schwarz,et al.  Scandals and the Public's Trust in Politicians: Assimilation and Contrast Effects , 1992 .

[58]  M. Biernat,et al.  Comparison and expectancy processes in human judgment. , 1991, Journal of personality and social psychology.

[59]  J. Krosnick Response strategies for coping with the cognitive demands of attitude measures in surveys , 1991 .

[60]  Donald R. Lehmann,et al.  The effects of fatigue on judgments of interproduct similarity , 1990 .

[61]  D. H. Wedell,et al.  A formal analysis of ratings of physical attractiveness: Successive contrast and simultaneous assimilation , 1987 .

[62]  R. Duncan Luce,et al.  Response Times: Their Role in Inferring Elementary Mental Organization , 1986 .

[63]  J. Falmagne Elements of psychophysical theory , 1985 .

[64]  S. Sussman,et al.  You're Only as Pretty as You Feel: Facial Expression as a Determinant of Physical Attractiveness , 1984 .

[65]  A. Elo The rating of chessplayers, past and present , 1978 .

[66]  L. L. Elliott Reliability of judgments of figural complexity. , 1958, Journal of experimental psychology.

[67]  D. R. Brown,et al.  Stimulus-similarity and the anchoring of subjective scales. , 1953, The American journal of psychology.

[68]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[69]  P. Moran On the method of paired comparisons. , 1947, Biometrika.

[70]  L. Thurstone The method of paired comparisons for social values , 1927 .