Deep Learning for the Digital Pathologic Diagnosis of Cholangiocarcinoma and Hepatocellular Carcinoma: Evaluating the Impact of a Web-based Diagnostic Assistant

While artificial intelligence (AI) algorithms continue to rival human performance on a variety of clinical tasks, the question of how best to incorporate these algorithms into clinical workflows remains relatively unexplored. We investigated how AI can affect pathologist performance on the task of differentiating between two subtypes of primary liver cancer, hepatocellular carcinoma (HCC) and cholangiocarcinoma (CC). We developed an AI diagnostic assistant using a deep learning model and evaluated its effect on the diagnostic performance of eleven pathologists with varying levels of expertise. Our deep learning model achieved an accuracy of 0.885 on an internal validation set of 26 slides and an accuracy of 0.842 on an independent test set of 80 slides. Despite having high accuracy on a hold out test set, the diagnostic assistant did not significantly improve performance across pathologists (p-value: 0.184, OR: 1.287 (95% CI 0.886, 1.871)). Model correctness was observed to significantly bias the pathologist decisions. When the model was correct, assistance significantly improved accuracy across all pathologist experience levels and for all case difficulty levels (p-value: < 0.001, OR: 4.289 (95% CI 2.360, 7.794)). When the model was incorrect, assistance significantly decreased accuracy across all 11 pathologists and for all case difficulty levels (p-value < 0.001, OR: 0.253 (95% CI 0.126, 0.507)). Our results highlight the challenges of translating AI models to the clinical setting, especially for difficult subspecialty tasks such as tumor classification. In particular, they suggest that incorrect model predictions could strongly bias an expert's diagnosis, an important factor to consider when designing medical AI-assistance systems.

[1]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[2]  Editors , 1986, Brain Research Bulletin.

[3]  E. L. Jones,et al.  The need for specialist review of pathology in paediatric cancer. , 1997, British Journal of Cancer.

[4]  A. Rydholm Improving the management of soft tissue sarcoma , 1998, BMJ.

[5]  J. Lei,et al.  Cytoplasmic staining of TTF-1 in the differential diagnosis of hepatocellular carcinoma vs cholangiocarcinoma and metastatic carcinoma of the liver. , 2006, American journal of clinical pathology.

[6]  S. Altekruse,et al.  Histological classification of liver and intrahepatic bile duct cancers in SEER registries. , 2011, Journal of registry management.

[7]  S. Sarewitz Subspecialization in community pathology practice. , 2014, Archives of pathology & laboratory medicine.

[8]  Augustus De Morgan,et al.  On probable Inference , 2014 .

[9]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[10]  Yong-Jun Liu,et al.  Trends in extramural consultation: comparison between subspecialized and general surgical pathology service models. , 2016, Annals of diagnostic pathology.

[11]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  H. El‐Serag,et al.  Epidemiology of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma , 2017, Cancer control : journal of the Moffitt Cancer Center.

[14]  Joanna L. Conant,et al.  Transition to Subspecialty Sign-Out at an Academic Institution and Its Advantages , 2017, Academic pathology.

[15]  H. Hass,et al.  Subclassification and Detection of New Markers for the Discrimination of Primary Liver Tumors by Gene Expression Analysis Using Oligonucleotide Arrays , 2017, Gut and liver.