An Automated System for Identifying Alcohol Use Status from Clinical Text

Alcohol use is one of the main risk factors related to many diseases. However, alcohol use information is buried in the patient's clinical records, and extracting this information from narrative text requires substantial manual labor. This work aims to develop an automated system for detecting alcohol use status from patients' discharge summaries. A combination of machine learning and rule-based techniques has been employed in order to identify alcohol status in three stages. In the first stage, the proposed system detects alcohol-related sentences by utilizing a keyword search technique. The second stage distinguishes between the negative and positive alcohol sentences and identifies the temporal status. In this stage different machine learning classifiers have been employed in order to achieve the best performance. Finally, the document level alcohol use status is aggregated from the sentence-level for each patient's record. The proposed system exhibits high performance in identifying alcohol use status, achieving an Fl-score up to 0.99 in identifying alcohol use related records, 0.96 in detecting negative records and 0.89 identifying temporal status.

[1]  Serguei V. S. Pakhomov,et al.  Automated Extraction of Substance Use Information from Clinical Texts , 2015, AMIA.

[2]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[3]  G. Testino Alcoholic diseases in hepato-gastroenterology: a point of view. , 2008, Hepato-gastroenterology.

[4]  L. Lix,et al.  Automated Classification of Alcohol Use by Text Mining of Electronic Medical Records , 2017, Online Journal of Public Health Informatics.

[5]  Lucy Vanderwende,et al.  Automatic Identification of Substance Abuse from Social History in Clinical Text , 2017, AIME.

[6]  Genevieve B. Melton,et al.  Social and Behavioral History Information in Public Health Datasets , 2012, AMIA.

[7]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[8]  C. Mathers,et al.  Global burden of disease in young people aged 10–24 years: a systematic analysis , 2011, The Lancet.

[9]  G. Arbanas Diagnostic and Statistical Manual of Mental Disorders (DSM-5) , 2015 .

[10]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[11]  M. Jarvis,et al.  Smoking, alcohol consumption, and leukocyte counts. , 1997, American journal of clinical pathology.

[12]  Hamid Mohamadlou,et al.  High-performance detection and early prediction of septic shock for alcohol-use disorder patients , 2016, Annals of medicine and surgery.

[13]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[14]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.