Ontology-based Venous Thromboembolism Risk Factors Mining and Model Developing from Medical Records

Padua linear model is widely used for the risk assessment of venous thromboembolism (VTE), which is a common and preventable complication for inpatients. However, differences of race, genetics and environment between Western and Chinese population limit Padua model' validity in Chinese patients. Extracting VTE risk factors from unstructured medical records in Chinese hospital can help to understand VTE events and develop efficient risk assessment model. In this study, we proposed an ontology-based method to mine VTE risk factors combining natural language processing (NLP) and machine learning (ML) methods. Medical records of 3106 inpatients were processed and terms in multiple ontologies from various sections of records enriched in VTE patients were sorted automatically. Then ML methods were used to estimate terms' importance and terms within admitting diagnosis and progress notes showed better VTE prediction performance than other sections. Finally a novel VTE prediction model was built based on selected terms and showed higher AUC score (0.815) than the Padua model (0.789).

[1]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[2]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  S. Resnick,et al.  Alzheimer's Disease Risk Assessment Using Large-Scale Machine Learning Methods , 2013, PLoS ONE.

[7]  P. Trott,et al.  International Classification of Diseases for Oncology , 1977 .

[8]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[9]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[10]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[11]  Fabio Massimo Zanzotto,et al.  Risk Assessment for Venous Thromboembolism in Chemotherapy-Treated Ambulatory Cancer Patients , 2017, Medical decision making : an international journal of the Society for Medical Decision Making.

[12]  J. Kai,et al.  Can machine-learning improve cardiovascular risk prediction using routine clinical data? , 2017, PloS one.

[13]  P. Prandoni,et al.  A risk assessment model for the identification of hospitalized medical patients at risk for venous thromboembolism: the Padua Prediction Score , 2010, Journal of thrombosis and haemostasis : JTH.