Opening the Black Box: Exploring Temporal Pattern of Type 2 Diabetes Complications in Patient Clustering Using Association Rules and Hidden Variable Discovery

There is a great deal of debate over the importance of explanation in AI models inferred from health data. In particular, there is a balance that needs to be made between the accuracy of complex 'deep' models such as convolutional neural networks and the transparency of models that aim to model data in a more 'human' way such as expert systems. In this paper, we explore the use of temporal association rules to validate and uncover the meaning behind discrete hidden variables that have been inferred from clinical diabetes data. We use a recently published technique based upon the IC* (Induction Causation) algorithm that limits the number of hidden variables and places them within a network structure. Here, we take the hidden variables and compare their underlying discrete states to clusters that have been generated from temporal association rules. This allows us to characterise the hidden states based upon different sequences of complications. Results are very promising, with many hidden states aligning with the discovered clusters giving us a direct interpretation.

[1]  V. Patel,et al.  Alphabet Strategy for diabetes care: A multi-professional, evidence-based, outcome-directed approach to management. , 2015, World journal of diabetes.

[2]  Jiong Yang,et al.  TAR: temporal association rules on evolving numerical attributes , 2001, Proceedings 17th International Conference on Data Engineering.

[3]  Allan Tucker,et al.  Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Alexandre Villeminot,et al.  Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set , 2007, Comput. Stat. Data Anal..

[5]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[6]  Parvez Ahmad,et al.  Techniques of Data Mining In Healthcare: A Review , 2015 .

[7]  Joydeep Ghosh,et al.  Distance based clustering of association rules , 1999 .

[8]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[9]  Leila Yousefi,et al.  Predicting Disease Complications Using a Stepwise Hidden Variable Approach for Learning Dynamic Bayesian Networks , 2018, 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS).

[10]  Michael Hahsler,et al.  Visualizing Association Rules : Introduction to the R-extension Package arulesViz , 2011 .

[11]  Michael Hahsler,et al.  Visualizing association rules in hierarchical groups , 2016, Journal of Business Economics.

[12]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[13]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[14]  Gang Liu,et al.  An improved K-Means Algorithm Based on Association Rules , 2014 .

[15]  A. Hingorani,et al.  Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis , 2009, The Lancet.

[16]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2003 .

[17]  Allan Tucker,et al.  Predicting Comorbidities Using Resampling and Dynamic Bayesian Networks with Latent Variables , 2017, 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS).

[18]  UK Prospective Diabetes Study (UKPDS). VIII. Study design, progress and performance. , 1991, Diabetologia.