Empirical Assessment of Expertise. (Special Section)

INTRODUCTION All people depend on experts to make life safe (e.g., by providing basic resources) and interesting (e.g., by entertaining with music and art). Most people would claim to be experts, at least at something. But are they really expert? How is the claim to be substantiated? Experts have often been identified by self-proclamation or acclamation by other experts as well as by experience, titles, and degrees. However, these methods can be misleading when searching for an expert. We prefer instead to cast the problem in empirical terms: An expert is someone who carries out a specified set of tasks expertly. Because it emphasizes behavior, this apparent tautology is not devoid of content. We propose to compare the job performance of candidate experts. In this paper we offer a new methodology for evaluating, on a relative basis, the degree of expertise demonstrated on a particular task. At first glance, one might hope to evaluate expertise by looking at outcomes. The ideal is to correlate action with a gold standard, an unequivocally valid, universally accepted outcome measure that directly reflects the behavior under scrutiny. The expert surgeon's patients are more likely to survive than are those of the poor surgeon; the expert air traffic controller's planes are more likely to arrive safely. Survival and safe arrival seem to be relevant gold standards. Where gold standards exist, there are well-established procedures for assessing expertise. When a judge makes dichotomous decisions, the correctness of which can be determined objectively, d' provides a measure of accuracy (Swets, 1986). For numerical responses, the Brier score (Brier, 1950) penalizes errors in relation to the square of their magnitudes. The expert performance approach (Ericsson & Lehmann, 1996) has been used with considerable success in finding behavioral assessments that generalize and thus suggest expertise. Someone who excels when tested in the laboratory is likely to excel in other settings as well. A fast sprinter outruns slower counterparts under most conditions. A chess master will select superior moves in unfamiliar positions. Reproducible success in controlled settings predicts success in real-world applications. When it is clear that an outcome measure captures expertise, it is appropriate to use it as a means to identify the expert. A potential problem is that a process may be more complex than use of the obvious outcome measure presupposes. Would it be surprising if the "best" surgeons generated poor survival rates? If patients and surgeons were randomly paired, medical outcome might be an effective assessment tool, but selection biases can render the correlation meaningless. A test that scales surgeons according to survival rates among their patients might be capturing the ability to attract easy cases rather than true surgical skill. The obvious gold standard may be tarnished. One must be very careful to select tasks for which meaningful comparisons are feasible. In the laboratory the investigator can ask doctors or trainees to diagnose cases for which correct designations are known (Ericsson & Smith, 1991). In the field one might compare the success rates of emergency room physicians when patients are assigned to the first available doctor (Ericsson & Lehmann, 1996). In contrast to most medical settings, in this case the assignment of patient to practitioner can be regarded as essentially random. For many tasks at which experts make a living, no measurable outcome exists. How is one to know if the wine taster has judged accurately or if the professor has graded the essays well? Adherents of the expert performance approach would question the merits of studying such domains. Although there is no hint of an objective external criterion, we believe that some people do these tasks better than others and that people improve their performance. We would like our assessment scheme to include such expertise. …