Assessing the Impact of Automated Suggestions on Decision Making: Domain Experts Mediate Model Errors but Take Less Initiative

Automated decision support can accelerate tedious tasks as users can focus their attention where it is needed most. However, a key concern is whether users overly trust or cede agency to automation. In this paper, we investigate the effects of introducing automation to annotating clinical texts — a multi-step, error-prone task of identifying clinical concepts (e.g., procedures) in medical notes, and mapping them to labels in a large ontology. We consider two forms of decision aid: recommending which labels to map concepts to, and pre-populating annotation suggestions. Through laboratory studies, we find that 18 clinicians generally build intuition of when to rely on automation and when to exercise their own judgement. However, when presented with fully pre-populated suggestions, these expert users exhibit less agency: accepting improper mentions, and taking less initiative in creating additional annotations. Our findings inform how systems and algorithms should be designed to mitigate the observed issues.

[1]  Eric D. Ragan,et al.  The Role of Domain Expertise in User Trust and the Impact of First Impressions with Intelligent Systems , 2020, HCOMP.

[2]  Gwenn Englebienne,et al.  How model accuracy and explanation fidelity influence user trust , 2019, IJCAI 2019.

[3]  Benoît Sagot,et al.  Influence of Pre-Annotation on POS-Tagged Corpus Development , 2010, Linguistic Annotation Workshop.

[4]  Eric Horvitz,et al.  Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance , 2019, HCOMP.

[5]  Ilaria Liccardi,et al.  Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making , 2020, WebSci.

[6]  Jeffrey Heer,et al.  Agency plus automation: Designing artificial intelligence into interactive systems , 2019, Proceedings of the National Academy of Sciences.

[7]  Kuzman Ganchev,et al.  Semi-Automated Named Entity Annotation , 2007, LAW@ACL.

[8]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[9]  Thomas S. Morton,et al.  WordFreak: An Open Tool for Linguistic Annotation , 2003, HLT-NAACL.

[10]  Xiao Zeng,et al.  A WEB-Based Version of MedLEE: A Medical Language Extraction and Encoding System. , 1996 .

[11]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[12]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[13]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[14]  E. Salas,et al.  Shared mental models in expert team decision making. , 1993 .

[15]  Suresh Manandhar,et al.  SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[16]  Hua Xu,et al.  Research and applications: Assisted annotation of medical free text using RapTAT , 2014, J. Am. Medical Informatics Assoc..

[17]  Louise Deléger,et al.  Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements , 2013, J. Am. Medical Informatics Assoc..

[18]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[19]  Lorenzo Strigini,et al.  How to Discriminate between Computer-Aided and Computer-Hindered Decisions , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[20]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[21]  Cláudio T. Silva,et al.  HistoryTracker: Minimizing Human Interactions in Baseball Game Annotation , 2019, CHI.

[22]  John T. Stasko,et al.  Combining Computational Analyses and Interactive Visualization for Document Exploration and Sensemaking in Jigsaw , 2013, IEEE Transactions on Visualization and Computer Graphics.

[23]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[24]  Lauren Wilcox,et al.  "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making , 2019, Proc. ACM Hum. Comput. Interact..

[25]  Dayong Wang,et al.  Deep Learning for Identifying Metastatic Breast Cancer , 2016, ArXiv.

[26]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[27]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[28]  Scott R. Klemmer,et al.  Early and Repeated Exposure to Examples Improves Creative Work , 2012, CogSci.

[29]  Krzysztof Z. Gajos,et al.  Providing Timely Examples Improves the Quantity and Quality of Generated Ideas , 2015, Creativity & Cognition.

[30]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[31]  Anna Rumshisky,et al.  MCN: A comprehensive corpus for medical concept normalization , 2019, J. Biomed. Informatics.

[32]  Parth Pathak,et al.  Annotation of a Large Clinical Entity Corpus , 2018, EMNLP.

[33]  Mara Gorli,et al.  What hinders the uptake of computerized decision support systems in hospitals? A qualitative study and framework for implementation , 2017, Implementation Science.

[34]  Ming Yin,et al.  Understanding the Effect of Accuracy on Trust in Machine Learning Models , 2019, CHI.

[35]  L C Kingsland,et al.  The ranking algorithm of the Coach browser for the UMLS metathesaurus. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[36]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[37]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[38]  Sarit Kraus,et al.  The Evolution of Sharedplans , 1999 .

[39]  Tamara Munzner,et al.  Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool for Investigative Journalists , 2014, IEEE Transactions on Visualization and Computer Graphics.

[40]  Krzysztof Z. Gajos,et al.  Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems , 2020, IUI.

[41]  Stéphane M. Meystre,et al.  Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text , 2014, J. Biomed. Informatics.

[42]  J. J. Higgins,et al.  The aligned rank transform for nonparametric factorial analyses using only anova procedures , 2011, CHI.

[43]  Iryna Gurevych,et al.  Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno , 2014, ACL.

[44]  John D. Lee,et al.  Trust in Automation: Designing for Appropriate Reliance , 2004, Hum. Factors.

[45]  Jeffrey Heer,et al.  Human Effort and Machine Learnability in Computer Aided Translation , 2014, EMNLP.

[46]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[47]  Zahra Ashktorab,et al.  Mental Models of AI Agents in a Cooperative Game Setting , 2020, CHI.

[48]  Jie Yang,et al.  YEDDA: A Lightweight Collaborative Text Span Annotation Tool , 2017, ACL.