Peer Assessment of Aviation Performance: Inconsistent for Good Reasons

Research into expertise is relatively common in cognitive science concerning expertise existing across many domains. However, much less research has examined how experts within the same domain assess the performance of their peer experts. We report the results of a modified think-aloud study conducted with 18 pilots (6 first officers, 6 captains, and 6 flight examiners). Pairs of same-ranked pilots were asked to rate the performance of a captain flying in a critical pre-recorded simulator scenario. Findings reveal (a) considerable variance within performance categories, (b) differences in the process used as evidence in support of a performance rating, (c) different numbers and types of facts (cues) identified, and (d) differences in how specific performance events affect choice of performance category and gravity of performance assessment. Such variance is consistent with low inter-rater reliability. Because raters exhibited good, albeit imprecise, reasons and facts, a fuzzy mathematical model of performance rating was developed. The model provides good agreement with observed variations.

[1]  M. Segers,et al.  Enhancing problem-solving expertise by means of an authentic, collaborative, computer supported and problem-based course , 2006 .

[2]  Jeffrey T. Hansberger,et al.  Improving Rater Calibration in Aviation: A Case Study , 2002 .

[3]  D. McNeill Gesture and Thought , 2005 .

[4]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[5]  Rhona Flin,et al.  Developing a Method for Evaluating Crew Resource Management Skills: A European Perspective , 2002 .

[6]  Gad S. Lim The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters , 2011 .

[7]  David J. Woehr,et al.  Rater training for performance appraisal: A quantitative review , 1994 .

[8]  Edwin Hutchins,et al.  How a Cockpit Remembers Its Speeds , 1995, Cogn. Sci..

[9]  Augustine O. Esogbue,et al.  Fuzzy sets and the modelling of physician decision processes, part I: The initial interview-information gathering session , 1979 .

[10]  A. Esogbue,et al.  Measurement and valuation of a fuzzy mathematical model for medical diagnosis , 1983 .

[11]  S. Loeb,et al.  Principal's Time Use and School Effectiveness , 2010, American Journal of Education.

[12]  Enrico Ciavolino,et al.  A fuzzy set theory based computational model to represent the quality of inter-rater agreement , 2014 .

[13]  T. Widlok Orientation in the wild : The shared cognition of Hai||om Bushpeople , 1997 .

[14]  Rhona Flin,et al.  Identifying the team skills required by nuclear power plant operations personnel , 2008 .

[15]  Guillermo Campitelli,et al.  Expertise in Complex Decision Making: The Role of Search in Chess 70 Years After de Groot , 2011, Cogn. Sci..

[16]  Michael Roth,et al.  Assessment of Nontechnical Skills From Measurement to Categorization Modeled by Fuzzy Logic , 2013 .

[17]  Tarcisio Abreu Saurin,et al.  Identification of non-technical skills from the resilience engineering perspective: a case study of an electricity distributor. , 2012, Work.

[18]  John B. Haviland,et al.  Anchoring, Iconicity, and Orientation in Guugu Yimithirr Pointing Gestures , 1993 .

[19]  Eduardo Salas,et al.  The Reliability of Instructor Evaluations of Crew Performance: Good News and Not So Good News , 2002 .

[20]  Mary Niemczyk,et al.  Improving Scoring Consistency of Flight Performance through Inter-Rater Reliability Analyses , 2008 .

[21]  M. Lee,et al.  Statistical Evidence in Experimental Psychology , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[22]  Sam Wineburg,et al.  Reading Abraham Lincoln: An Expert/Expert Study in the Interpretation of Historical Texts , 1998, Cogn. Sci..

[23]  A. Muijtjens,et al.  Workplace-based assessment: effects of rater expertise , 2010, Advances in health sciences education : theory and practice.

[24]  Timothy E. Goldsmith,et al.  Assessing and Improving Evaluation of Aircrew Performance , 2002 .

[25]  Lynne Martin,et al.  Development of the NOTECHS (non-technical skills) system for assessing pilots’ CRM skills , 2018, Human Factors and Aerospace Safety.

[26]  L. Suchman Human-Machine Reconfigurations: Plans and situated actions (2nd edition). , 2007 .

[27]  Johan Bergström,et al.  From Crew Resource Management to Operational Resilience , 2011 .

[28]  Michael Roth,et al.  Understanding Variance in Pilot Performance Ratings Two Studies of Flight Examiners, Captains, and First Officers Assessing the Performance of Peers , 2013 .

[29]  Wolff-Michael Roth,et al.  When Are Graphs Worth Ten Thousand Words? An Expert-Expert Study , 2003 .

[30]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[31]  J. Bring,et al.  How do GPs use clinical information in their judgements of heart failure? A clinical judgement analysis study. , 1998, Scandinavian journal of primary health care.

[32]  René Amalberti,et al.  The paradoxes of almost totally safe transportation systems , 2001 .

[33]  Nathan Gardels,et al.  The Dialogical Imagination , 2009 .

[34]  Tarcisio Abreu Saurin,et al.  RETRACTED: Identification of non-technical skills from the resilience engineering perspective: A case study of an electricity distributor , 2013 .

[35]  David J. Weiss,et al.  Empirical Assessment of Expertise , 2003, Hum. Factors.

[36]  Hsueh-Chih Chen,et al.  Improving Creativity Performance Assessment: A Rater Effect Examination with Many Facet Rasch Model , 2012 .

[37]  Monika Richter,et al.  Cognition In The Wild , 2016 .

[38]  Marilyne Stains,et al.  Classification of Chemical Reactions: Stages of Expertise , 2008 .

[39]  Paul J. Feltovich,et al.  Categorization and Representation of Physics Problems by Experts and Novices , 1981, Cogn. Sci..

[40]  Elana Shohamy,et al.  The Effect of Raters' Background and Training on the Reliability of Direct Writing Tests , 1992 .

[41]  K. A. Ericsson,et al.  Protocol Analysis: Verbal Reports as Data , 1984 .

[42]  Jan Maarten Schraagen,et al.  How Experts Solve a Novel Problem in Experimental Design , 1993, Cogn. Sci..

[43]  Erik Hollnagel,et al.  Human factors and folk models , 2004, Cognition, Technology & Work.

[44]  Cindy E. Hmelo-Silver,et al.  Comparing expert and novice understanding of a complex system from the perspective of structures, behaviors, and functions , 2004, Cogn. Sci..

[45]  C. Dominik Güss,et al.  Cross-National Comparisons of Complex Problem-Solving Strategies in Two Microworlds , 2009, Cogn. Sci..

[46]  K. A. Ericsson,et al.  Protocol analysis: Verbal reports as data, Rev. ed. , 1993 .

[47]  Sidney Dekker,et al.  Sharing the Burden of Flight Deck Automation Training , 2000 .

[48]  Rafael E. Núñez,et al.  Facing the Sunrise: Cultural Worldview Underlying Intrinsic-Based Encoding of Absolute Frames of Reference in Aymara , 2012, Cogn. Sci..

[49]  Coşkun Özkan,et al.  A FUZZY METHOD ON DETERMINING OF JOB AND PERSONNEL EVALUATION RESULTS, AND MATCHING THEM WITH SUGGESTED MODEL , 2010 .

[50]  E. Wagenmakers,et al.  Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). , 2011, Journal of personality and social psychology.

[51]  Lucy Suchman,et al.  Human-Machine Reconfigurations: Plans and Situated Actions , 2006 .

[52]  James E. Warren “Generic” and “Specific” Expertise in English: An Expert/Expert Study in Poetry Interpretation and Academic Argument , 2011 .

[53]  Augustine O. Esogbue,et al.  Fuzzy sets and the modelling of physician decision processes, part II: fuzzy diagnosis decision models , 1980 .

[54]  David J. Weiss,et al.  Performance-based assessment of expertise: How to decide if someone is an expert or not , 2002, Eur. J. Oper. Res..

[55]  Marjan J. B. Govaerts,et al.  Broadening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training Assessment , 2007, Advances in health sciences education : theory and practice.

[56]  Stephen C. Levinson,et al.  Language and Cognition: The Cognitive Consequences of Spatial Description in Guugu Yimithirr , 1997 .

[57]  Dedre Gentner,et al.  Causal Systems Categories: Differences in Novice and Expert Categorization of Causal Phenomena , 2012, Cogn. Sci..