Injecting Domain Knowledge in Electronic Medical Records to Improve Hospitalization Prediction

Electronic medical records (EMR) contain key information about the different symptomatic episodes that a patient went through. They carry a great potential in order to improve the well-being of patients and therefore represent a very valuable input for artificial intelligence approaches. However, the explicit knowledge directly available through these records remains limited, the extracted features to be used by machine learning algorithms do not contain all the implicit knowledge of medical expert. In order to evaluate the impact of domain knowledge when processing EMRs, we augment the features extracted from EMRs with ontological resources before turning them into vectors used by machine learning algorithms. We evaluate these augmentations with several machine learning algorithms to predict hospitalization. Our approach was experimented on data from the PRIMEGE PACA database that contains more than 350,000 consultations carried out by 16 general practitioners (GPs).

[1]  Macarena Espinilla,et al.  Using Ontologies for the Online Recognition of Activities of Daily Living† , 2018, Sensors.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[4]  Araceli Sanchis,et al.  Activity Recognition Using Hybrid Generative/Discriminative Models on Home Environments Using Binary Sensors , 2013, Sensors.

[5]  Hua Min,et al.  Predicting activities of daily living for cancer patients using an ontology-guided machine learning methodology , 2017, Journal of Biomedical Semantics.

[6]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[7]  Pascal Staccini,et al.  Creation of the First French Database in Primary Care Using the ICPC2: Feasibility Study , 2017, MedInfo.

[8]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Catherine Faron-Zucker,et al.  The KGRAM Abstract Machine for Knowledge Graph Querying , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[13]  John P. A. Ioannidis,et al.  Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review , 2017, J. Am. Medical Informatics Assoc..

[14]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[15]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[16]  George Forman,et al.  Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement , 2010, SKDD.