To the Editor: Artificial intelligence (AI) systems are now regularly being used in medical settings,1 although regulatory oversight is inconsistent and undeveloped.2,3 Safe deployment of clinical AI requires informed clinician-users, who are generally responsible for identifying and reporting emerging problems. Clinicians may also serve as administrators in governing the use of clinical AI. A natural question follows: are clinicians adequately prepared to identify circumstances in which AI systems fail to perform their intended function reliably? A major driver of AI system malfunction is known as “dataset shift.”4,5 Most clinical AI systems today use machine learning, algorithms that leverage statistical methods to learn key patterns from clinical data. Dataset shift occurs when a machine-learning system underperforms because of a mismatch between the data set with which it was developed and the data on which it is deployed.4 For example, the University of Michigan Hospital implemented the widely used sepsis-alerting model developed by Epic Systems; in April 2020, the model had to be deactivated because of spurious alerting owing to changes in patients’ demographic characteristics associated with the coronavirus disease 2019 pandemic. This was a case in which dataset shift fundamentally altered the relationship between fevers and bacterial sepsis, leading the hospital’s clinical AI governing committee (which one of the authors of this letter chairs) to decommission its use. This is an extreme example; many causes of dataset shift are more subtle. In Table 1, we present common causes of dataset shift, which we group into changes in technology (e.g., software vendors), changes in population and setting (e.g., new demographics), and changes in behavior (e.g., new reimbursement incentives); the list is not meant to be exhaustive. Successful recognition and mitigation of dataset shift require both vigilant clinicians and sound technical oversight through AI governance teams.4,5 When using an AI system, clinicians should note misalignment between the predictions of the model and their own clinical judgment, as in the sepsis example above. Clinicians who use AI systems must frequently consider whether relevant aspects of their own clinical practice are atypical or have recently changed. For their part, AI governance teams must be sure that it is easy for clinicians to report concerns about the function of AI systems and provide feedback so that the clinician who is reporting will understand that the registered concern has been noted and, if appropriate, actions to mitigate the concern have been taken. Teams must also establish AI monitoring and updating protocols that integrate technical solu-
[1]
Suchi Saria,et al.
From development to deployment: dataset shift, causality, and shift-stable models in health AI.
,
2019,
Biostatistics.
[2]
Suchi Saria,et al.
Tutorial: Safe and Reliable Machine Learning
,
2019,
ArXiv.
[3]
K. Borgwardt,et al.
Machine Learning in Medicine
,
2015,
Mach. Learn. under Resour. Constraints Vol. 3.