Risk Factors Extraction from Clinical Texts based on Linked Open Data

This paper presents experiments in risk factors analysis based on clinical texts enhanced with Linked Open Data (LOD). The idea is to determine whether a patient has risk factors for a specific disease analyzing only his/her outpatient records. A semantic graph of “meta-knowledge” about a disease of interest is constructed, with integrated multilingual terms (labels) of symptoms, risk factors etc. coming from Wikidata, PubMed, Wikipedia and MESH, and linked to clinical records of individual patients via ICD–10 codes. Then a predictive model is trained to foretell whether patients are at risk to develop the disease of interest. The testing was done using outpatient records from a nation-wide repository available for the period 2011-2016. The results show improvement of the overall performance of all tested algorithms (kNN, Naive Bayes, Tree, Logistic regression, ANN), when the clinical texts are enriched with LOD resources.

[1]  Galia Angelova,et al.  Integrating Data Analysis Tools for Better Treatment of Diabetic Patients , 2017, DAMDID/RCDL.

[2]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[3]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[4]  Stefan Schulz,et al.  SEMCARE: Multilingual Semantic Search in Semi-Structured Clinical Data , 2016, eHealth.

[5]  Stephanie E. Combs,et al.  Review of Developments in Electronic, Clinical Data Collection, and Documentation Systems over the Last Decade – Are We Ready for Big Data in Routine Health Care? , 2016, Front. Oncol..

[6]  Gang Pan,et al.  Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services , 2017, BioMed research international.

[7]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10]  Zina M. Ibrahim,et al.  SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research , 2017, bioRxiv.

[11]  Christoph U. Lehmann,et al.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress. , 2017, Yearbook of medical informatics.

[12]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Christopher G. Chute,et al.  Using Linked Data for Mining Drug-Drug Interactions in Electronic Health Records , 2014, MedInfo.

[15]  Tudor Groza,et al.  CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital , 2017, bioRxiv.

[16]  Khalifeh AlJadda,et al.  The Semantic Knowledge Graph: A Compact, Auto-Generated Model for Real-Time Traversal and Ranking of any Relationship within a Domain , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[17]  Saeed Hassanpour,et al.  Artificial Intelligence in Medicine , 2015 .

[18]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[19]  Michel Dumontier,et al.  Mining Electronic Health Records using Linked Data , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[20]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[21]  Pierre Zweigenbaum,et al.  Clinical Natural Language Processing in languages other than English: opportunities and challenges , 2018, Journal of Biomedical Semantics.

[22]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.