Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making.

[1]  G. Chapman,et al.  [Medical decision making]. , 1976, Lakartidningen.

[2]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[3]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[4]  J. H. Davis,et al.  An Integrative Model Of Organizational Trust , 1995 .

[5]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[6]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[7]  Eta S. Berner,et al.  Clinical Decision Support Systems , 1999, Health Informatics.

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  HongJiang Zhang,et al.  Relevance Feedback in CBIR , 2002, VDB.

[10]  Avinash C. Kak,et al.  Content-based image retrieval from large medical databases , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[11]  M. Chalmers,et al.  Seamful and Seamless Design in Ubiquitous Computing , 2003 .

[12]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[13]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[14]  H. Mcdonald,et al.  Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. , 2005, JAMA.

[15]  Adam Wright,et al.  White paper: A Roadmap for National Action on Clinical Decision Support , 2007, J. Am. Medical Informatics Assoc..

[16]  Anant Madabhushi,et al.  AUTOMATED GRADING OF PROSTATE CANCER USING ARCHITECTURAL AND TEXTURAL IMAGE FEATURES , 2007, 2007 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[17]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[18]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[19]  Vipin Chaudhary,et al.  Content based sub-image retrieval system for high resolution pathology images using salient interest points , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  Hayit Greenspan,et al.  Content-Based Image Retrieval in Radiology: Current Status and Future Directions , 2010, Journal of Digital Imaging.

[21]  James Fogarty,et al.  Regroup: interactive machine learning for on-demand group creation in social networks , 2012, CHI.

[22]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[23]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[24]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[26]  Muhammad Sharif,et al.  Intelligent Image Retrieval Techniques: A Survey , 2014 .

[27]  D. Bates,et al.  Clinical Decision Support Systems , 1999, Health Informatics.

[28]  Todd Kulesza,et al.  Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.

[29]  Sos Agaian,et al.  Computer-Aided Prostate Cancer Diagnosis From Digitized Histopathology: A Review on Texture-Based Systems , 2015, IEEE Reviews in Biomedical Engineering.

[30]  Masooda Bashir,et al.  Trust in Automation , 2015, Hum. Factors.

[31]  Been Kim,et al.  iBCM: Interactive Bayesian Case Model Empowering Humans via Intuitive Interaction , 2015 .

[32]  Daniel Fabbri,et al.  Toward content-based image retrieval with deep convolutional neural networks , 2015, Medical Imaging.

[33]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Karrie Karahalios,et al.  First I "like" it, then I hide it: Folk Theories of Social Feeds , 2016, CHI.

[35]  D F Sittig,et al.  Clinical Decision Support: a 25 Year Retrospective and a 25 Year Vision , 2016, Yearbook of Medical Informatics.

[36]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[37]  René F. Kizilcec How Much Information?: Effects of Transparency on Trust in an Algorithmic Interface , 2016, CHI.

[38]  L. Egevad,et al.  A Contemporary Prostate Cancer Grading System: A Validated Alternative to the Gleason Score. , 2016, European urology.

[39]  John Zimmerman,et al.  Investigating the Heart Pump Implant Decision Process: Opportunities for Decision Support Tools to Help , 2016, CHI.

[40]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[41]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  David Maxwell Chickering,et al.  Machine Teaching: A New Paradigm for Building Machine Learning Systems , 2017, ArXiv.

[43]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  S. Jha,et al.  Why CAD Failed in Mammography. , 2018, Journal of the American College of Radiology : JACR.

[45]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[46]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[47]  David T. Marc,et al.  Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis , 2018, JMIR medical informatics.

[48]  Karrie Karahalios,et al.  The Illusion of Control: Placebo Effects of Control Settings , 2018, CHI.

[49]  Carrie J. Cai,et al.  The effects of example-based explanations in a machine learning interface , 2019, IUI.