Measuring clinician–machine agreement in differential diagnoses for dermatology

Artificial intelligence (AI) algorithms have generated significant interest as a tool to assist in clinical workflows, particularly in image-based diagnostics such as melanoma detection.1,2 These algorithms typically answer narrowly-scoped questions, such as "is this lesion malignant?" By contrast, dermatologists frequently tackle less structured diagnostic questions, such as "what is this rash?" In practice, evaluating clinical cases often involves integrating insights from morphology, context, and history to determine a ranked-ordered list of possible diagnoses, i.e. a differential diagnosis, rather than a binary "yes" or "no" answer. An AI algorithm could aid a less experienced clinician by providing its own differential diagnosis, which may highlight potential diagnoses that have not been considered, and thereby help the clinician decide between additional evaluation and empiric treatment.