Causal risk factor discovery for severe acute kidney injury using electronic health records

BackgroundAcute kidney injury (AKI), characterized by abrupt deterioration of renal function, is a common clinical event among hospitalized patients and it is associated with high morbidity and mortality. AKI is defined in three stages with stage-3 being the most severe phase which is irreversible. It is important to effectively discover the true risk factors in order to identify high-risk AKI patients and allow better targeting of tailored interventions. However, Stage-3 AKI patients are very rare (only 0.2% of AKI patients) with a large scale of features available in EHR (1917 potential risk features), yielding a scenario unfeasible for any correlation-based feature selection or modeling method. This study aims to discover the key factors and improve the detection of Stage-3 AKI.MethodsA causal discovery method (McDSL) is adopted for causal discovery to infer true causal relationship between information buried in EHR (such as medication, diagnosis, laboratory tests, comorbidities and etc.) and Stage-3 AKI risk. The research approach comprised two major phases: data collection, and causal discovery. The first phase is propose to collect the data from HER (includes 358 encounters and 891 risk factors). Finally, McDSL is employed to discover the causal risk factors of Stage-3 AKI, and five well-known machine learning models are built for predicting Stage-3 AKI with 10-fold cross-validation (predictive accuracy were measured by AUC, precision, recall and F-score).ResultsMcDSL is useful for further research of EHR. It is able to discover four causal features, all selected features are medications that are modifiable. The latest research of machine learning is employed to compare the performance of prediction, and the experimental result has verified the selected features are pivotal.ConclusionsThe features selected by McDSL, which enable us to achieve significant dimension reduction without sacrificing prediction accuracy, suggesting potential clinical use such as helping physicians develop better prevention and treatment strategies.

[1]  B.H.M. Sadeghi,et al.  A BP-neural network predictor model for plastic injection molding process , 2000 .

[2]  Sushrut S Waikar,et al.  Race and mortality after acute renal failure. , 2007, Journal of the American Society of Nephrology : JASN.

[3]  Yong Hu,et al.  Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction , 2015, Soft Computing.

[4]  Norbert Lameire,et al.  Notice , 2012, Kidney International Supplements.

[5]  George Sugihara,et al.  Detecting Causality in Complex Ecosystems , 2012, Science.

[6]  Stuart M. Speedie,et al.  The value of structured data elements from electronic health records for identifying subjects for primary care clinical trials , 2015, BMC Medical Informatics and Decision Making.

[7]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[8]  Lemuel R Waitman,et al.  Expressing observations from electronic medical record flowsheets in an i2b2 based clinical data repository to support research and quality improvement. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[9]  Joseph V Bonventre,et al.  Acute kidney injury, mortality, length of stay, and costs in hospitalized patients. , 2005, Journal of the American Society of Nephrology : JASN.

[10]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[11]  Jason Roy,et al.  Prediction Modeling Using EHR Data: Challenges, Strategies, and a Comparison of Machine Learning Approaches , 2010, Medical care.

[12]  George Dowswell,et al.  The initial development and assessment of an automatic alert warning of acute kidney injury. , 2011, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[13]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[14]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[15]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Rohit J. Kate,et al.  Prediction and detection models for acute kidney injury in hospitalized older adults , 2016, BMC Medical Informatics and Decision Making.

[17]  Joshua C Denny,et al.  Development of Inpatient Risk Stratification Models of Acute Kidney Injury for Use in Electronic Health Records , 2010, Medical decision making : an international journal of the Society for Medical Decision Making.

[18]  Yong Hu,et al.  Discovering Many-to-One Causality in Software Project Risk Analysis , 2014, 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[19]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[20]  Jun Hu,et al.  Determining molecular predictors of adverse drug reactions with causality analysis based on structure learning , 2014, J. Am. Medical Informatics Assoc..

[21]  Ya Zhang,et al.  A machine learning-based framework to identify type 2 diabetes through electronic health records , 2017, Int. J. Medical Informatics.

[22]  S. Uchino,et al.  Prediction Models and Their External Validation Studies for Mortality of Patients with Acute Kidney Injury: A Systematic Review , 2017, PloS one.