SPECIAL ARTICLE: Cognitive, Social and Environmental Sources of Bias in Clinical Performance Ratings

Background: Global ratings based on observing convenience samples of clinical performance form the primary basis for appraising the clinical competence of medical students, residents, and practicing physicians. This review explores cognitive, social, and environmental factors that contribute unwanted sources of score variation (bias) to clinical performance evaluations. Summary: Raters have a 1 or 2-dimensional conception of clinical performance and do not recall details. Good news is reported more quickly and fully than bad news, leading to overly generous performance evaluations. Training has little impact on accuracy and reproducibility of clinical performance ratings. Conclusions: Clinical performance evaluation systems should assure broad, systematic sampling of clinical situations; keep rating instruments short; encourage immediate feedback for teaching and learning purposes; encourage maintenance of written performance notes to support delayed clinical performance ratings; give raters feedback about their ratings; supplement formal with unobtrusive observation; make promotion decisions via group review; supplement traditional observation with other clinical skills measures (e.g., Objective Structured Clinical Examination); encourage rating of specific performances rather than global ratings; and establish the meaning of ratings in the manner used to set normal limits for clinical diagnostic investigations.

[1]  J Noak,et al.  Performance rating. , 1999, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[2]  D. Woehr Performance dimension accessibility: Implications for rating accuracy , 1992 .

[3]  M. Hojat,et al.  Is the glass half full or half empty? A reexamination of the associations between assessment measures during medical school and clinical competence after graduation , 1993, Academic medicine : journal of the Association of American Medical Colleges.

[4]  R. Wigton The effects of student personal characteristics on the evaluation of clinical performance. , 1980, Journal of medical education.

[5]  Robert L. Heneman,et al.  The Effects of Time Delay in Rating and Amount of Information Observed on Performance Rating Accuracy , 1983 .

[6]  R. Munzenrider,et al.  A Resident's Internal Medicine Practice , 1979, Evaluation & the health professions.

[7]  Robert G. Lord,et al.  Cognitive categorization and dimensional schemata: A process approach to the study of halo in performance ratings. , 1983 .

[8]  A. Denisi,et al.  Organization of information in memory and the performance appraisal process: evidence from the field. , 1996, The Journal of applied psychology.

[9]  David J. Woehr,et al.  Rater training for performance appraisal: A quantitative review , 1994 .

[10]  H. Macrae,et al.  Comparing checklists and databases with physicians' ratings as measures of students' history and physical‐examination skills , 1995, Academic medicine : journal of the Association of American Medical Colleges.

[11]  D. Solomon,et al.  An innovative evaluation method in an internal medicine clerkship , 1996, Academic medicine : journal of the Association of American Medical Colleges.

[12]  J. Shea,et al.  Relationships of ratings of clinical competence and ABIM scores to certification status , 1993, Academic medicine : journal of the Association of American Medical Colleges.

[13]  A L Scheuneman,et al.  Residency evaluations. Are they worth the effort? , 1994, Archives of surgery.

[14]  W. McGaghie,et al.  Simulation technology for health care professional skills training and assessment. , 1999, JAMA.

[15]  Sidney Rosen,et al.  The Reluctance to Transmit Bad News , 1975 .

[16]  R. Reznick,et al.  Who should rate candidates in an objective structured clinical examination? , 1996, Academic medicine : journal of the Association of American Medical Colleges.

[17]  Bruce J. Avolio,et al.  A meta-analysis of age differences in job performance. , 1986 .

[18]  T. Hassard,et al.  Assessing practicing physicians in two settings using standardized patients , 1992, Academic medicine : journal of the Association of American Medical Colleges.

[19]  William C. McGaghie,et al.  Effectiveness of a Cardiology Review Course for Internal Medicine Residents Using Simulation Technology and Deliberate Practice , 2002, Teaching and learning in medicine.

[20]  J. Feightner,et al.  Difficulties in clinical skills evaluation , 1983, Medical education.

[21]  L. James,et al.  The inconsistency with which raters weight and combine information across targets , 1995 .

[22]  J A Shea,et al.  Toward Setting a Research Agenda for Systematic Reviews of Evidence of the Effects of Medical Education , 2001, Teaching and learning in medicine.

[23]  J. Colliver,et al.  Technical issues: test application. AAMC , 1993, Academic medicine : journal of the Association of American Medical Colleges.

[24]  E. Levine,et al.  Delay and distortion: Tacit influences on performance appraisal effectiveness. , 1988 .

[25]  A. Denisi,et al.  The effect of performance appraisal salience on recall and ratings , 1990 .

[26]  Ronald A. Berk,et al.  Performance Assessment: Methods and Applications , 1986 .

[27]  Hannah R. Rothstein,et al.  Interrater reliability of job performance ratings: Growth to asymptote level with increasing opportunity to observe. , 1990 .

[28]  J. Carline,et al.  Feasibility of hospital‐based use of peer ratings to evaluate the performances of practicing physicians , 1996, Academic medicine : journal of the Association of American Medical Colleges.

[29]  P L Stillman,et al.  Results of a survey on the use of standardized patients to teach and evaluate clinical skills , 1990, Academic medicine : journal of the Association of American Medical Colleges.

[30]  R. Buchanan,et al.  An AAMC pilot study by 10 medical schools of clinical evaluation of students. , 1987, Journal of medical education.

[31]  L. Nieman,et al.  The Teaching and Practice of Cardiac Auscultation during Internal Medicine and Cardiology Training: A Nationwide Survey , 1993, Annals of Internal Medicine.

[32]  Nancy E. Day Can Performance Raters Be More Accurate? Investigating the Benefits of Prior Knowledge of Performance Dimensions , 1995 .

[33]  W. Shaffir,et al.  Ritual Evaluation of Competence , 1982 .

[34]  C. Violato,et al.  Feasibility and psychometric properties of using peers, consulting physicians, co‐workers, and patients to assess physicians , 1997, Academic medicine : journal of the Association of American Medical Colleges.

[35]  M. Whitcomb Competency-based graduate medical education? Of course! But how should competency be assessed? , 2002, Academic medicine : journal of the Association of American Medical Colleges.

[36]  Chockalingam Viswesvaran,et al.  Role of social desirability in personality testing for personnel selection: The red herring. , 1996 .

[37]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[38]  J M Felner,et al.  Effectiveness of a computer-based system to teach bedside cardiology. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[39]  Eduardo Salas,et al.  Team Performance Assessment and Measurement: Theory, Methods, and Applications. Series in Applied Psychology. , 1997 .

[40]  A. Memon,et al.  On the "general acceptance" of eyewitness testimony research. A new survey of the experts. , 2001, The American psychologist.

[41]  John R. Hollenbeck,et al.  Turnover functionality versus turnover frequency: A note on work attitudes and organizational effectiveness , 1986 .

[42]  M. Philbin,et al.  Assessment of clinical skills of residents utilizing standardized patients. A follow-up study and recommendations for application. , 1991, Annals of internal medicine.

[43]  D. Mazur,et al.  A national survey of grading systems used in medicine clerkships , 1990, Academic medicine : journal of the Association of American Medical Colleges.

[44]  L. Pangaro,et al.  Assessing How Well Three Evaluation Methods Detect Deficiencies in Medical Students' Professionalism in Two Settings of an Internal Medicine Clerkship , 2000, Academic medicine : journal of the Association of American Medical Colleges.

[45]  M B Donnelly,et al.  Ward evaluations: should they be abandoned? , 1997, The Journal of surgical research.

[46]  G. Regehr,et al.  Context, Conflict, and Resolution: A New Conceptual Framework for Evaluating Professionalism , 2000, Academic medicine : journal of the Association of American Medical Colleges.

[47]  David L. Post,et al.  Medical Problem Solving: An Analysis of Clinical Reasoning , 1979 .

[48]  A. Stillman,et al.  Assessing clinical skills of residents with standardized patients. , 1986, Annals of internal medicine.

[49]  J. Colliver,et al.  Station‐length requirements for reliable performance‐based examination scores , 1993, Academic medicine : journal of the Association of American Medical Colleges.

[50]  W. Metheny Limitations of physician ratings in the assessment of student clinical performance in an obstetrics and gynecology clerkship. , 1991, Obstetrics and gynecology.

[51]  H. Barrows,et al.  Direct, standardized assessment of clinical competence , 1987, Medical education.

[52]  J. Carline,et al.  Ratings of the performances of practicing internists by hospital‐based registered nurses , 1993, Academic medicine : journal of the Association of American Medical Colleges.

[53]  R. Martell,et al.  A comparison of the behavioral rating accuracy of groups and individuals , 1993 .

[54]  David E. Smith,et al.  A FIELD STUDY OF PERFORMANCE APPRAISAL PURPOSE: RESEARCH‐ VERSUS ADMINISTRATIVE‐ BASED RATINGS , 1995 .

[55]  F. Wolf Lessons to be learned from evidence-based medicine: practice and promise of evidence-based medicine and evidence-based education , 2000 .

[56]  Randall A. Gordon,et al.  Impact of ingratiation on judgments and evaluations: A meta-analytic investigation. , 1996 .

[57]  M. Donnelly,et al.  The relationship between faculty ward evaluations, OSCE, and ABSITE as measures of surgical intern performance. , 1995, American journal of surgery.

[58]  David B. Swanson,et al.  Assessment of clinical skills with standardized patients: State of the art , 1990 .

[59]  D. Elliot,et al.  Evaluation of physical examination skills. Reliability of faculty observers and patient instructors. , 1987, JAMA.

[60]  Yoav Ganzach,et al.  Negativity (and positivity) in performance evaluation: Three field studies. , 1995 .

[61]  Harris Cooper,et al.  Psychological Bulletin: Editorial. , 2003 .

[62]  Paul R. Sackett,et al.  Rater−ratee race effects on performance evaluation : challenging meta-analytic conclusions , 1991 .

[63]  Joseph A. Gier,et al.  Ceilings in the Reliability and Validity of Performance Ratings: The Case of Expert Raters , 1989 .

[64]  M. F. Rhoton A new method to evaluate clinical performance and critical incidents in anaesthesia: quantification of daily comments by teachers , 1990, Medical education.

[65]  Chockalingam Viswesvaran,et al.  Comparative analysis of the reliability of job performance ratings , 1996 .

[66]  L. Pangaro,et al.  How well do internal medicine faculty members evaluate the clinical skills of residents? , 1992, Annals of internal medicine.

[67]  Madeline E. Heilman,et al.  Being attractive, advantage or disadvantage? Performance-based evaluations and recommended personnel actions as a function of appearance, sex, and job type. , 1985 .

[68]  Captain H. C. Alger Cockpit resource management. , 1989, Aviation, space, and environmental medicine.

[69]  L. Cronbach Processes affecting scores on understanding of others and assumed similarity. , 1955, Psychological bulletin.

[70]  Mark Albanese,et al.  Systematic Reviews: What Are They and Why Should We Care? , 2002, Advances in health sciences education : theory and practice.

[71]  H. Barrows,et al.  Standardized (simulated) patients' accuracy in recording clinical performance check‐list items , 1992, Medical education.

[72]  J. Carline,et al.  Characteristics of Ratings of Physician Competence by Professional Associates , 1989, Evaluation & the health professions.

[73]  Peter Wright The harassed decision maker: Time pressures, distractions, and the use of evidence. , 1974 .

[74]  S. Wright,et al.  National survey of internal medicine residency program directors regarding problem residents. , 2000, JAMA.

[75]  D. Newble The critical incident technique: a new approach to the assessment of clinical performance , 1983, Medical education.

[76]  A. H. Church Do You See What I See? An Exploration of Congruence in Ratings From Multiple Perspectives1 , 1997 .

[77]  A. Haghighat,et al.  Performance of A 3 MCNP™ for Calculation of 3-D Neutron Flux Distribution in a BWR Core Shroud , 2001 .

[78]  J. Carline,et al.  Use of peer ratings to evaluate physician performance. , 1993, JAMA.

[79]  P. Wolfson,et al.  Accuracy of surgery clerkship performance raters. , 1991, Academic medicine : journal of the Association of American Medical Colleges.

[80]  A. Ronai,et al.  Influence of anesthesiology residents' noncognitive skills on the occurrence of critical incidents and the residents' overall clinical performances , 1991, Academic medicine : journal of the Association of American Medical Colleges.

[81]  Beatriz Muñoz-Seca,et al.  Knowledge and Problem Solving , 2004 .

[82]  J. Norcini,et al.  The Mini-CEX (Clinical Evaluation Exercise): A Preliminary Investigation , 1995, Annals of Internal Medicine.

[83]  J. Shatzer,et al.  Performance of “standardized examinees” in a standardized‐patient examination of clinical skills , 1997, Academic medicine : journal of the Association of American Medical Colleges.

[84]  Thomas J. Williams Choosing the Right Stuff: The Psychological Selection of Astronauts and Cosmonauts , 1995 .

[85]  Walter C. Borman,et al.  Examination of race and sex effects on performance ratings , 1989 .

[86]  D. Waldman,et al.  A Field Study of Rating Conditions and Leniency in Performance Appraisal , 1988 .

[87]  S. Zedeck,et al.  Relations Between Measures of Typical and Maximum Job Performance , 1988 .

[88]  Reed G. Williams,et al.  Generalizability of Performance on Different-Station-Length Standardized Patient Cases , 1994 .

[89]  H. P. Sims,et al.  Behind the Mask: The Politics of Employee Appraisal , 1987 .

[90]  R. Brennan,et al.  A generalizability study of a new standardized rating form used to evaluate students' clinical clerkship performances , 1998, Academic medicine : journal of the Association of American Medical Colleges.

[91]  Consistency of pass‐fail decisions made with clinical clerkship ratings and standardized‐patient examination scores , 1994, Academic medicine : journal of the Association of American Medical Colleges.

[92]  W. Burdick,et al.  Observation of emergency medicine residents at the bedside: how often does it happen? , 1995, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[93]  L. Jacoby,et al.  Knowledge and clinical problem‐solving , 1985, Medical education.

[94]  Bert F. Green,et al.  Performance assessment for the workplace , 1991 .

[95]  Wayne F. Cascio,et al.  Cumulative evidence of the relationship between employee age and job performance. , 1989 .

[96]  D. Newble,et al.  The selection and training of examiners for clinical examinations , 1980, Medical education.

[97]  W. Shaffir,et al.  Becoming Doctors: The Adoption of a Cloak of Competence , 1987 .

[98]  Angelo S. DeNisi,et al.  Initial Decisions and Subsequent Performance Ratings , 1986 .

[99]  G. Milkovich,et al.  The Current State of Performance Appraisal Research and Practice: Concerns, Directions, and Implications , 1992 .

[100]  H. John Bernardin,et al.  Appraisal Accuracy: The Ability and Motivation to Remember the Past , 1982 .

[101]  M. Kane,et al.  Assessment of Professional Competence , 2014 .

[102]  Allen I. Huffcutt,et al.  Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisor, peer, and self-ratings. , 1997 .

[103]  A. Denisi,et al.  Search and retrieval patterns for performance information: effects on evaluations of multiple targets , 1986 .

[104]  J. Colliver,et al.  A factor analysis study of performance of first-year residents. , 1986, Journal of medical education.