Automatic Identification of Substance Abuse from Social History in Clinical Text

Substance abuse poses many negative health risks. Tobacco use increases the rates of many diseases such as coronary heart disease and lung cancer. Clinical notes contain rich information detailing the history of substance abuse from caregivers perspective. In this work, we present our work on automatic identification of substance abuse from clinical text. We created a publicly available dataset that has been annotated for three types of substance abuse including tobacco, alcohol, and drug, with 7 entity types per event, including status, type, method, amount, frequency, exposure-history and quit-history. Using a combination of machine learning and natural language processing approaches, our results on an unseen test set range from 0.51–0.58 F1 on stringent, full event, identification, and from 0.80–0.91 F1 for identification of the substance abuse event and status. These results indicate the feasibility of extracting detailed substance abuse information from clinical records.

[1]  Brian Wilson,et al.  Case Report: Identifying Smokers with a Medical Extraction System , 2008, J. Am. Medical Informatics Assoc..

[2]  Lucy Vanderwende,et al.  Statistical Section Segmentation in Free-Text Clinical Records , 2012, LREC.

[3]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[4]  Cosmin Adrian Bejan,et al.  Assertion modeling and its role in clinical phenotype identification , 2013, J. Biomed. Informatics.

[5]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Ranjana Srivastava Complicated lives--taking the social history. , 2011, The New England journal of medicine.

[8]  Serguei V. S. Pakhomov,et al.  Automated Extraction of Substance Use Information from Clinical Texts , 2015, AMIA.

[9]  Es Chen,et al.  An Analysis of Free-Text Alcohol Use Documentation in the Electronic Health Record , 2014, Applied Clinical Informatics.

[10]  Genevieve B. Melton,et al.  Social and Behavioral History Information in Public Health Datasets , 2012, AMIA.

[11]  Genevieve B. Melton,et al.  Representation of Drug Use in Biomedical Standards, Clinical Text, and Research Measures , 2015, AMIA.

[12]  Aaron M. Cohen,et al.  Case Report: Five-way Smoking Status Classification Using Text Hot-Spot Identification and Error-correcting Output Codes , 2008, J. Am. Medical Informatics Assoc..

[13]  Pradeep Kumar Ray,et al.  A preliminary study on automatic identification of patient smoking status in unstructured electronic health records , 2015, BioNLP@IJCNLP.

[14]  B. Aggarwal,et al.  Cancer is a Preventable Disease that Requires Major Lifestyle Changes , 2008, Pharmaceutical Research.