A Visualization Approach for Rapid Labeling of Clinical Notes for Smoking Status Extraction

Labeling is typically the most human-intensive step during the development of supervised learning models. In this paper, we propose a simple and easy-to-implement visualization approach that reduces cognitive load and increases the speed of text labeling. The approach is fine-tuned for task of extraction of patient smoking status from clinical notes. The proposed approach consists of the ordering of sentences that mention smoking, centering them at smoking tokens, and annotating to enhance informative parts of the text. Our experiments on clinical notes from the MIMIC-III clinical database demonstrate that our visualization approach enables human annotators to label sentences up to 3 times faster than with a baseline approach.

[1]  Sophia Ananiadou,et al.  A Neural Model for Aggregating Coreference Annotation in Crowdsourcing , 2020, COLING.

[2]  Iryna Gurevych,et al.  From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains , 2020, ACL.

[3]  Umit Topaloglu,et al.  Extracting Smoking Status from Electronic Health Records Using NLP and Deep Learning. , 2020, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[4]  Masatoshi Yoshikawa,et al.  Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing , 2020, LREC.

[5]  Shanshan Zhang,et al.  How to Invest my Time: Lessons from Human-in-the-Loop Entity Extraction , 2019, KDD.

[6]  Saeed Hassanpour,et al.  Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes , 2019, BMC Medical Informatics Decis. Mak..

[7]  Amit Acharya,et al.  Tobacco use status from clinical notes using Natural Language Processing and rule based algorithm. , 2018, Technology and health care : official journal of the European Society for Engineering and Medicine.

[8]  Bryan Pardo,et al.  A Human-in-the-Loop System for Sound Event Detection and Annotation , 2018, ACM Trans. Interact. Intell. Syst..

[9]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[10]  Hongfang Liu,et al.  Comparison of Three Information Sources for Smoking Information in Electronic Health Records , 2016, Cancer informatics.

[11]  Dacheng Tao,et al.  Active Learning for Crowdsourcing Using Knowledge Transfer , 2014, AAAI.

[12]  Benjamin M. Good,et al.  Crowdsourcing for bioinformatics , 2013, Bioinform..

[13]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Yunyao Li,et al.  An Intuitive User Interface for Human-in-the-loop Entity Name Parsing and Entity Variant Generation , 2020, DaSH@KDD.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .