Integrating Co-Clustering and Interpretable Machine Learning for the Prediction of Intravenous Immunoglobulin Resistance in Kawasaki Disease

Identifying intravenous immunoglobulin-resistant patients is essential for the prompt and optimal treatment of Kawasaki disease, suggesting the need for effective risk assessment tools. Data-driven approaches have the potential to identify the high-risk individuals by capturing the complex patterns of real-world data. To enable clinically applicable prediction of intravenous immunoglobulin resistance addressing the incompleteness of clinical data and the lack of interpretability of machine learning models, a multi-stage method is developed by integrating data missing pattern mining and intelligible models. First, co-clustering is adopted to characterize the block-wise data missing patterns by simultaneously grouping the clinical features and patients to enable (a) group-based feature selection and missing data imputation and (b) patient subgroup-specific predictive models considering the availability of data. Second, feature selection is performed using the group Lasso to uncover group-specific risk factors. Third, the Explainable Boosting Machine, which is an interpretable learning method based on generalized additive models, is applied for the prediction of each patient subgroup. The experiments using real-world Electronic Health Records demonstrate the superior performance of the proposed framework for predictive modeling compared with a set of benchmark methods. This study highlights the integration of co-clustering and supervised learning methods for incomplete clinical data mining, and promotes data-driven approaches to investigate predictors and effective algorithms for decision making in healthcare.

[1]  Daoqiang Zhang,et al.  A simultaneous learning framework for clustering and classification , 2009, Pattern Recognit..

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4]  Jing Hua,et al.  Simultaneous Localized Feature Selection and Model Detection for Gaussian Mixtures , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  George D. C. Cavalcanti,et al.  Dynamic classifier selection: Recent advances and perspectives , 2018, Inf. Fusion.

[6]  Jinbo Bi,et al.  Multi-view cluster analysis with incomplete data to understand treatment effects , 2019, Inf. Sci..

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Gary S. Collins,et al.  Reporting of artificial intelligence prediction models , 2019, The Lancet.

[9]  Juan-Jose Beunza,et al.  Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease) , 2019, J. Biomed. Informatics.

[10]  J. Kagawa,et al.  Novel Risk Assessment Tool for Immunoglobulin Resistance in Kawasaki Disease: Application Using a Random Forest Classifier , 2017, The Pediatric infectious disease journal.

[11]  Husanbir Singh Pannu,et al.  A Systematic Review on Imbalanced Data Challenges in Machine Learning , 2019, ACM Comput. Surv..

[12]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[13]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[14]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[15]  Mohamed Nadif,et al.  CoClust: A Python Package for Co-Clustering , 2019, Journal of Statistical Software.

[16]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[17]  G. Kim,et al.  Factors Predicting Resistance to Intravenous Immunoglobulin Treatment and Coronary Artery Lesion in Patients with Kawasaki Disease: Analysis of the Korean Nationwide Multicenter Survey from 2012 to 2014 , 2017, Korean circulation journal.

[18]  Joydeep Ghosh,et al.  A framework for simultaneous co-clustering and learning from complex data , 2007, KDD '07.

[19]  Hong Yan,et al.  Coclustering of Multidimensional Big Data: A Useful Tool for Genomic, Financial, and Other Data Analysis , 2017, IEEE Systems, Man, and Cybernetics Magazine.

[20]  Rich Caruana,et al.  InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.

[21]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[22]  B. McCrindle,et al.  Diagnosis, Treatment, and Long-Term Management of Kawasaki Disease: A Scientific Statement for Health Professionals From the American Heart Association , 2017, Circulation.

[23]  Gérard Govaert,et al.  Mutual information, phi-squared and model-based co-clustering for contingency tables , 2016, Advances in Data Analysis and Classification.

[24]  Yunjia Tang,et al.  A comparison of efficacy of six prediction models for intravenous immunoglobulin resistance in Kawasaki disease , 2018, Italian Journal of Pediatrics.

[25]  Scott M. Lundberg,et al.  Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[26]  Guang-Hui Qian,et al.  Predictors of intravenous immunoglobulin-resistant Kawasaki disease in children: a meta-analysis of 4442 cases , 2018, European Journal of Pediatrics.

[27]  L. Jonides,et al.  Kawasaki disease. , 1994, Journal of pediatric health care : official publication of National Association of Pediatric Nurse Associates & Practitioners.

[28]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[29]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[30]  M. Song,et al.  Meta-analysis of factors predicting resistance to intravenous immunoglobulin treatment in patients with Kawasaki disease , 2016, Korean journal of pediatrics.

[31]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[32]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[33]  Gérard Govaert,et al.  Co-Clustering: Models, Algorithms and Applications , 2013 .

[34]  Kevin Kampschroer,et al.  Feature Importance and Predictive Modeling for Multi-source Healthcare Data with Missing Values , 2016, Digital Health.

[35]  Jie Tian,et al.  A new model for predicting intravenous immunoglobin-resistant Kawasaki disease in Chongqing: a retrospective study on 5277 patients , 2019, Scientific Reports.