GSIC: A New Interpretable System for Knowledge Exploration and Classification

Machine learning and data mining techniques have been developed rapidly in recent times. In tasks such as classification, machine learning techniques have been shown to equal to and even surpass human performance. However, high performance models are usually complex, opaque and have low interpretability thus making it difficult to explain the underlying behaviors of those models that lead to the final outcomes. In many domains such as medicine and healthcare, interpretability is one of the most important factors when considering the adoption of those models. In this paper, we propose a two-stage binary classification system applicable for healthcare (or general) data that benefits from a high level of interpretability and can at the same time achieve the results comparable to commonly used classification techniques. The motivation behind the proposed system is the lack of effective classification methods for handling data generated by various distributions (such as healthcare or banking data) that can harmonize both performance and interpretability perspectives. In this work, we tackle the problem by applying divide and conquer strategy on a new disentangled representation of the underlying data. The merit of our system is evaluated by a classification experiment with a wide range of real data and popular transparent and black-box models. Furthermore, a use case in data of sepsis patients staying in the ICU (Intensive Care Unit) is depicted to prove the interpretability of the proposed model.

[1]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[2]  Yi Luo,et al.  Balancing accuracy and interpretability of machine learning approaches for radiation treatment outcomes modeling , 2019, BJR open.

[3]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[4]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[5]  Aki Vehtari,et al.  Making Bayesian Predictive Models Interpretable: A Decision Theoretic Approach , 2019, ArXiv.

[6]  K. Ming Leung,et al.  Learning Vector Quantization , 2017, Encyclopedia of Machine Learning and Data Mining.

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Teuvo Kohonen,et al.  Essentials of the self-organizing map , 2013, Neural Networks.

[9]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[10]  H. Rue,et al.  Bayesian bivariate meta‐analysis of diagnostic test studies with interpretable priors , 2015, Statistics in medicine.

[11]  Jimeng Sun,et al.  PEARL: Prototype Learning via Rule Learning , 2019, BCB.

[12]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[13]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[14]  Thomas Villmann,et al.  Prototype-based Neural Network Layers: Incorporating Vector Quantization , 2018, ArXiv.

[15]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[16]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[17]  Carol A. Seger,et al.  Category learning in the brain. , 2010, Annual review of neuroscience.

[18]  T. Villmann,et al.  Learning vector quantization and relevances in complex coefficient space , 2019, Neural Computing and Applications.

[19]  F. Cabitza,et al.  Unintended Consequences of Machine Learning in Medicine , 2017, JAMA.

[20]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[21]  D. Angus,et al.  Assessment of Global Incidence and Mortality of Hospital-treated Sepsis. Current Estimates and Limitations. , 2016, American journal of respiratory and critical care medicine.

[22]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[23]  Niranjan Kissoon,et al.  Recognizing Sepsis as a Global Health Priority - A WHO Resolution. , 2017, The New England journal of medicine.

[24]  Alfredo Vellido,et al.  The importance of interpretability and visualization in machine learning for applications in medicine and health care , 2019, Neural Computing and Applications.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[27]  Satoshi Hara,et al.  Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach , 2016, AISTATS.

[28]  Thomas Villmann,et al.  Metric Learning for Prototype-Based Classification , 2009, Innovations in Neural Information Paradigms and Applications.

[29]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[30]  Rich Caruana,et al.  Axiomatic Interpretability for Multiclass Additive Models , 2018, KDD.

[31]  R. Bellomo,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). , 2016, JAMA.

[32]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[34]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[35]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[36]  Henrik Boström,et al.  Trade-off between accuracy and interpretability for predictive in silico modeling. , 2011, Future medicinal chemistry.

[37]  Tom J. Pollard,et al.  A Comparative Analysis of Sepsis Identification Methods in an Electronic Database* , 2018, Critical care medicine.