Prediction of chronic kidney disease risk using multimodal data

Chronic kidney disease (CKD) is a widespread public health problem and often leads to kidney failure which needs hemodialysis or even kidney transplantation. Undoubtedly, prediction of the risk of CKD among healthy people is highly desirable and very meaningful. However, most studies in this field used logistic regression (LR) and produced results with limited accuracy. Also, these studies ignored unstructured data which contained useful information. To improve CKD prediction, in this study, we built a novel multimodal data model that integrated Bidirectional Encoder Representations from Transformers with Light Gradient Boosting Machine (termed MD-BERT-LGBM model hereafter), and applied it to a group of 3295 participants for CKD prediction study. We collected medical data for over three months from each participant. We compared this novel integrated framework with three conventional models: the LR, LGBM, and Multimodal Disease Risk Prediction algorithm based on Convolutional Neural Networks (CNN-MDRP). The experimental results show that the new MD-BERT-LGBM model outperformed all the three conventional models in terms of accuracy, recall, and Area Under the ROC curve (AUC), which are 78.12%, 75.65%, and 85.15%, respectively. This result demonstrates the potential of this proposed method in the clinical application of CKD prediction and prevention.

[1]  Carol Coupland,et al.  Predicting the risk of Chronic Kidney Disease in Men and Women in England and Wales: prospective derivation and external validation of the QKidney® Scores , 2010, BMC family practice.

[2]  Bernard J. Jansen,et al.  Developing an online hate classifier for multiple social media platforms , 2020, Human-centric Computing and Information Sciences.

[3]  T. Wong,et al.  Logistic regression was as good as machine learning for predicting major chronic diseases. , 2020, Journal of clinical epidemiology.

[4]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Dr. S. Vijayarani,et al.  Liver Disease Prediction using SVM and Naïve Bayes Algorithms , 2015 .

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Antonio-Javier Gallego,et al.  Improving Convolutional Neural Networks’ Accuracy in Noisy Environments Using k-Nearest Neighbors , 2018 .

[8]  A. Kengne,et al.  Risk Models to Predict Chronic Kidney Disease and Its Progression: A Systematic Review , 2012, PLoS medicine.

[9]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[10]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[11]  Keiji Yasuda,et al.  Prediction Models for Risk of Type-2 Diabetes Using Health Claims , 2018, BioNLP.

[12]  Steven Bethard,et al.  Does BERT need domain adaptation for clinical negation detection? , 2020, J. Am. Medical Informatics Assoc..

[13]  Min Chen,et al.  Disease Prediction by Machine Learning Over Big Data From Healthcare Communities , 2017, IEEE Access.

[14]  Y. Kanno,et al.  Identifying progressive CKD from healthy population using Bayesian network and artificial intelligence: A worksite-based cohort study , 2019, Scientific Reports.

[15]  Saurabh Pal,et al.  Early Prediction of Heart Diseases Using Data Mining Techniques , 2013 .

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[18]  Made Satria Wibawa,et al.  Boosted classifier and features selection for enhancing chronic kidney disease diagnose , 2017, 2017 5th International Conference on Cyber and IT Service Management (CITSM).

[19]  Tom Dhaene,et al.  Prediction of delayed graft function after kidney transplantation: comparison between logistic regression and machine learning methods , 2015, BMC Medical Informatics and Decision Making.

[20]  Hsiu-Ching Hsu,et al.  A prediction model for the risk of incident chronic kidney disease. , 2010, The American journal of medicine.

[21]  Huan Liu,et al.  Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts , 2019 .

[22]  Heejung Bang,et al.  A simple algorithm to predict incident kidney disease. , 2008, Archives of internal medicine.

[23]  Qiong Yang,et al.  Performance of a genetic risk score for CKD stage 3 in the general population. , 2012, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Gary S Collins,et al.  A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods. , 2013, Journal of clinical epidemiology.

[26]  Derwin Suhartono,et al.  Hierarchical Attention Network with XGBoost for Recognizing Insufficiently Supported Argument , 2017, MIWAI.

[27]  J. Witteman,et al.  One Risk Assessment Tool for Cardiovascular Disease, Type 2 Diabetes, and Chronic Kidney Disease , 2012, Diabetes Care.

[28]  Thar Baker,et al.  Early Prediction of Chronic Kidney Disease Using Machine Learning Supported by Predictive Analytics , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).