Classification of radiology reports for falls in an HIV study cohort

OBJECTIVE To identify patients in a human immunodeficiency virus (HIV) study cohort who have fallen by applying supervised machine learning methods to radiology reports of the cohort. METHODS We used the Veterans Aging Cohort Study Virtual Cohort (VACS-VC), an electronic health record-based cohort of 146 530 veterans for whom radiology reports were available (N=2 977 739). We created a reference standard of radiology reports, represented each report by a feature set of words and Unified Medical Language System concepts, and then developed several support vector machine (SVM) classifiers for falls. We compared mutual information (MI) ranking and embedded feature selection approaches. The SVM classifier with MI feature selection was chosen to classify all radiology reports in VACS-VC. RESULTS Our SVM classifier with MI feature selection achieved an area under the curve score of 97.04 on the test set. When applied to all the radiology reports in VACS-VC, 80 416 of these reports were classified as positive for a fall. Of these, 11 484 were associated with a fall-related external cause of injury code (E-code) and 68 932 were not, corresponding to 29 280 patients with potential fall-related injuries who could not have been found using E-codes. DISCUSSION Feature selection was crucial to improving the classifier's performance. Feature selection with MI allowed us to select the number of discriminative features to use for classification, in contrast to the embedded feature selection method, in which the number of features is chosen automatically. CONCLUSION Machine learning is an effective method of identifying patients who have suffered a fall. The development of this classifier supplements the clinical researcher's toolkit and reduces dependence on under-coded structured electronic health record data.

[1]  Joseph L. Goulet,et al.  Increased Risk of Fragility Fractures among HIV Infected Compared to Uninfected Male Veterans , 2011, PloS one.

[2]  Joel D. Martin,et al.  Case Report: Identifying Wrist Fracture Patients with High Accuracy by Automatic Categorization of X-ray Reports , 2006, J. Am. Medical Informatics Assoc..

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Dezon Finch,et al.  Finding falls in ambulatory care clinical documents using statistical text mining , 2013, J. Am. Medical Informatics Assoc..

[5]  Mary Young,et al.  Fracture incidence in HIV-infected women: results from the Women's Interagency HIV Study , 2010, AIDS.

[6]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[7]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Anthony N. Nguyen,et al.  Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning and the SNOMED CT Ontology , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[10]  Jon D. Patrick,et al.  Research and applications: Supervised machine learning and active learning in classification of radiology reports , 2014, J. Am. Medical Informatics Assoc..

[11]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[12]  Roland Marquet,et al.  Labeling of Multiple HIV-1 Proteins with the Biarsenical-Tetracysteine System , 2011, PloS one.

[13]  Sebastián M. Real,et al.  E2F1 Regulates Cellular Growth by mTORC1 Signaling , 2011, PloS one.

[14]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[15]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[16]  Monica Chiarini Tremblay,et al.  Identifying fall-related injuries: Text mining the electronic medical record , 2009, Inf. Technol. Manag..

[17]  Matthew Scotch,et al.  The Yale cTAKES extensions for document classification: architecture and application , 2011, J. Am. Medical Informatics Assoc..

[18]  Son Doan,et al.  Natural Language Processing in Biomedicine: A Unified System Architecture Overview , 2014, Methods in molecular biology.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.