"Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making

Although rapid advances in machine learning have made it increasingly applicable to expert decision-making, the delivery of accurate algorithmic predictions alone is insufficient for effective human-AI collaboration. In this work, we investigate the key types of information medical experts desire when they are first introduced to a diagnostic AI assistant. In a qualitative lab study, we interviewed 21 pathologists before, during, and after being presented deep neural network (DNN) predictions for prostate cancer diagnosis, to learn the types of information that they desired about the AI assistant. Our findings reveal that, far beyond understanding the local, case-specific reasoning behind any model decision, clinicians desired upfront information about basic, global properties of the model, such as its known strengths and limitations, its subjective point-of-view, and its overall design objective--what it's designed to be optimized for. Participants compared these information needs to the collaborative mental models they develop of their medical colleagues when seeking a second opinion: the medical perspectives and standards that those colleagues embody, and the compatibility of those perspectives with their own diagnostic patterns. These findings broaden and enrich discussions surrounding AI transparency for collaborative decision-making, providing a richer understanding of what experts find important in their introduction to AI assistants before integrating them into routine practice.

[1]  E. Shortliffe Clinical decision-support systems , 1990 .

[2]  Ming Zhou,et al.  Pathologist-Level Grading of Prostate Biopsies with Artificial Intelligence , 2019, ArXiv.

[3]  R. Elosua,et al.  Pilot study to validate a computer-based clinical decision support system for dyslipidemia treatment (HTE-DLP). , 2013, Atherosclerosis.

[4]  L. Egevad,et al.  A Contemporary Prostate Cancer Grading System: A Validated Alternative to the Gleason Score. , 2016, European urology.

[5]  Rob Procter,et al.  Drawing the Line Between Perception and Interpretation in Computer-Aided Mammography , 1997 .

[6]  P. Ayton,et al.  Use of computer-aided detection (CAD) tools in screening mammography: a multidisciplinary investigation. , 2005, The British journal of radiology.

[7]  Min Kyung Lee,et al.  A Human-Centered Approach to Algorithmic Services: Considerations for Fair and Motivating Smart Community Service Management that Allocates Donations to Non-Profit Organizations , 2017, CHI.

[8]  Ming Yin,et al.  Understanding the Effect of Accuracy on Trust in Machine Learning Models , 2019, CHI.

[9]  N. Shah,et al.  What This Computer Needs Is a Physician: Humanism and Artificial Intelligence. , 2018, Journal of the American Medical Association (JAMA).

[10]  Jakob E. Bardram,et al.  Context-Based Workplace Awareness , 2010, Computer Supported Cooperative Work (CSCW).

[11]  Saturnino Luz,et al.  Achieving Diagnosis by Consensus , 2009, Computer Supported Cooperative Work (CSCW).

[12]  Aaron Halfaker,et al.  Value-Sensitive Algorithm Design , 2018, Proc. ACM Hum. Comput. Interact..

[13]  W. Tierney,et al.  Provider Response to Computer-Based Care Suggestions for Chronic Heart Failure , 2005, Medical care.

[14]  David T. Marc,et al.  Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis , 2018, JMIR medical informatics.

[15]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[16]  Rob Procter,et al.  Subjective responses to prompting in screening mammography , 1997 .

[17]  Madhu C. Reddy,et al.  Understanding together: sensemaking in collaborative information seeking , 2010, CSCW '10.

[18]  E. Vayena,et al.  Machine learning in medicine: Addressing ethical challenges , 2018, PLoS medicine.

[19]  Kenji Suzuki Machine Learning in Computer-Aided Diagnosis: Medical Imaging Intelligence and Analysis , 2012 .

[20]  Carrie J. Cai,et al.  The effects of example-based explanations in a machine learning interface , 2019, IUI.

[21]  Morten Hertzum,et al.  Artefactual Multiplicity: A Study of Emergency-Department Whiteboards , 2011, Computer Supported Cooperative Work (CSCW).

[22]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[23]  Yasuo Yamashita,et al.  Magnetic Resonance Image Analysis for Brain CAD Systems with Machine Learning , 2012 .

[24]  Minh N. Do,et al.  Automatic Gleason grading of prostate cancer using quantitative phase imaging and machine learning , 2017, Journal of biomedical optics.

[25]  John Zimmerman,et al.  Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes , 2019, CHI.

[26]  Qian Yang,et al.  Designing Theory-Driven User-Centric Explainable AI , 2019, CHI.

[27]  Nico Karssemeijer,et al.  Influence of study design in receiver operating characteristics studies: sequential versus independent reading , 2014, Journal of medical imaging.

[28]  D. Bates,et al.  Clinical Decision Support Systems , 1999, Health Informatics.

[29]  WilcoxLauren,et al.  "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making , 2019 .

[30]  Sos Agaian,et al.  Computer-Aided Prostate Cancer Diagnosis From Digitized Histopathology: A Review on Texture-Based Systems , 2015, IEEE Reviews in Biomedical Engineering.

[31]  S. Jha,et al.  Why CAD Failed in Mammography. , 2018, Journal of the American College of Radiology : JACR.

[32]  Karrie Karahalios,et al.  Communicating Algorithmic Process in Online Behavioral Advertising , 2018, CHI.

[33]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[34]  Stuart Anderson,et al.  Reading the lesson: eliciting requirements for a mammography training application , 2009, Medical Imaging.

[35]  Jennifer Marie Logg,et al.  When do people rely on algorithms , 2016 .

[36]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[37]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[38]  Madhu C. Reddy,et al.  Temporality in Medical Work: Time also Matters , 2006, Computer Supported Cooperative Work (CSCW).

[39]  Madhu C. Reddy,et al.  Re-coordinating activities: an investigation of articulation work in patient transfers , 2013, CSCW.

[40]  D. Wegner,et al.  Cognitive interdependence in close relationships , 1985 .

[41]  Jakob E. Bardram,et al.  Competence articulation: alignment of competences and responsibilities in synchronous telemedical collaboration , 2008, CHI.

[42]  Rob Procter,et al.  Moving beyond local practice: reconfiguring the adoption of a breast cancer diagnostic technology. , 2015, Social science & medicine.

[43]  Bram van Ginneken,et al.  Automated Gleason Grading of Prostate Biopsies using Deep Learning , 2019, ArXiv.

[44]  Dympna O'Sullivan,et al.  The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems , 2015, 2015 International Conference on Healthcare Informatics.

[45]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[46]  Nasir M. Rajpoot,et al.  Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images , 2016, IEEE Trans. Medical Imaging.

[47]  Martin Wattenberg,et al.  Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making , 2019, CHI.

[48]  Alex Voss,et al.  'Repairing' the Machine: A Case Study of the Evaluation of Computer-Aided Detection Tools in Breast Screening , 2003, ECSCW.

[49]  Yan Xiao,et al.  Supporting coordination in surgical suites: physical aspects of common information spaces , 2010, CHI.

[50]  Madhu C. Reddy,et al.  A finger on the pulse: temporal rhythms and information seeking in medical work , 2002, CSCW '02.

[51]  Rob Procter,et al.  Grid-based mammography training , 2003 .

[52]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[53]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[54]  Daniel Smilkov,et al.  Similar image search for histopathology: SMILY , 2019, npj Digital Medicine.

[55]  Peter Carruthers,et al.  Theories of theories of mind: Frontmatter , 1996 .

[56]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[57]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[58]  Dinggang Shen,et al.  Machine Learning in Medical Imaging , 2012, Lecture Notes in Computer Science.

[59]  Ophir Frieder,et al.  Clinical Decision Support , 2006 .

[60]  Andrew J. Evans,et al.  Publisher Correction: Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer , 2019, npj Digital Medicine.

[61]  Andrew C. Simpson,et al.  Collaboration and Trust in Healthcare Innovation: The eDiaMoND Case Study , 2005, Computer Supported Cooperative Work (CSCW).

[62]  Zhan Zhang,et al.  Coordination Mechanisms for Self-Organized Work in an Emergency Communication Center , 2018, Proc. ACM Hum. Comput. Interact..

[63]  Morten Fjeld,et al.  Understanding Design for Automated Image Analysis in Digital Pathology , 2016, NordiCHI.

[64]  R. Wears,et al.  Computer technology and clinical work: still waiting for Godot. , 2005, JAMA.

[65]  Rob Procter,et al.  Performance Management in Breast Screening: A Case Study of Professional Vision , 2002, Cognition, Technology & Work.

[66]  Sergio G Veloso,et al.  Interobserver agreement of Gleason score and modified Gleason score in needle biopsy and in surgical specimen of prostate cancer. , 2007, International braz j urol : official journal of the Brazilian Society of Urology.

[67]  E. Salas,et al.  Shared mental models in expert team decision making. , 1993 .

[68]  Hilda Tellioglu,et al.  Work Practices Surrounding PACS: The Politics of Space in Hospitals , 2001, Computer Supported Cooperative Work (CSCW).

[69]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[70]  Gunnar Steineck,et al.  Interobserver variability in the pathological assessment of radical prostatectomy specimens: Findings of the Laparoscopic Prostatectomy Robot Open (LAPPRO) study , 2014, Scandinavian journal of urology.

[71]  Mark S. Ackerman,et al.  Information Work in Bone Marrow Transplant: Reducing Misalignment of Perspectives , 2017, CSCW.

[72]  Ellery Wulczyn,et al.  Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer , 2018, npj Digital Medicine.

[73]  Alex Voss,et al.  Working IT out in e-Science: Experiences of Requirements Capture in a HealthGrid Project , 2005, HealthGrid.

[74]  Peter Carruthers,et al.  Theories of theories of mind: What is acquired – theory-theory versus simulation-theory , 1996 .

[75]  V. Braun,et al.  Using thematic analysis in psychology , 2006 .

[76]  Z Kaufman,et al.  Triple approach in the diagnosis of dominant breast masses: Combined physical examination, mammography, and fine‐needle aspiration , 1994, Journal of surgical oncology.

[77]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[78]  Rob Procter,et al.  Prompting in mammography : computer-aided detection or computer-aided diagnosis? , 1998 .

[79]  Helena M. Mentis Collocated Use of Imaging Systems in Coordinated Surgical Practice , 2017, Proc. ACM Hum. Comput. Interact..

[80]  V. Braun,et al.  What can “thematic analysis” offer health and wellbeing researchers? , 2014, International journal of qualitative studies on health and well-being.

[81]  Casey S. Greene,et al.  Unsupervised Feature Construction and Knowledge Extraction from Genome-Wide Assays of Breast Cancer with Denoising Autoencoders , 2014, Pacific Symposium on Biocomputing.

[82]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[83]  Tobias Bachmeier,et al.  Theories Of Theories Of Mind , 2016 .

[84]  Philip R. O. Payne,et al.  Questions for Artificial Intelligence in Health Care. , 2019, JAMA.

[85]  W A Schmidt,et al.  Usefulness of the triple test score for palpable breast masses; discussion 1012-3. , 2001, Archives of surgery.

[86]  Paul Dourish,et al.  The Appropriation of Interactive Technologies: Some Lessons from Placeless Documents , 2003, Computer Supported Cooperative Work (CSCW).