Data-Driven Visual Characterization of Patient Health-Status Using Electronic Health Records and Self-Organizing Maps

Hypertension and diabetes have become a global health and economic issue, being among the major chronic conditions worldwide, particularly in developed countries. To face this global problem, a better knowledge about these diseases becomes crucial to characterize chronic patients. Our aim is two-fold: (1) to provide an efficient visual tool for identifying clinical patterns in high-dimensional data; and (2) to characterize the patient health-status through a data-driven approach using electronic health records of healthy, hypertensive and diabetic populations. We propose a two-stage methodology that uses diagnosis and drug codes of healthy and chronic patients associated to the University Hospital of Fuenlabrada in Spain. The first stage applies the Self-Organizing Map on the aforementioned data to get a set of prototype patients which are projected onto a grid of nodes. Each node has associated a prototype patient that captures relationships among clinical characteristics. In the second stage, clustering methods are applied on the prototype patients to find groups of patients with a similar health-status. Clusters with distinctive patterns linked to chronic conditions were found, being the most remarkable highlights: a cluster of pregnant women emerged among the hypertensive population, and two clusters of diabetic individuals with significant differences in drug-therapy (insulin and non-insulin dependant). The proposed methodology showed to be effective to explore relationships within clinical data and to find patterns related to diabetes and hypertension in a visual way. Our methodology raises as a suitable alternative for building appropriate clinical groups, becoming a promising approach to be applied to any population due to its data-driven philosophy. A thorough analysis of these groups could spawn new and fruitful findings.

[1]  Kwai-Sang Chin,et al.  A medical procedure-based patient grouping method for an emergency department , 2014, Appl. Soft Comput..

[2]  Martijn J. Schuemie,et al.  Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records , 2013, BMC Medical Informatics and Decision Making.

[3]  P Barbieri,et al.  Comparison of self-organizing maps classification approach with cluster and principal components analysis for large environmental data sets. , 2007, Water research.

[4]  G. Bakris,et al.  Combination therapy in hypertension. , 2010, Journal of the American Society of Hypertension : JASH.

[5]  Edward R. Dougherty,et al.  Model-based evaluation of clustering validation measures , 2007, Pattern Recognit..

[6]  B. Baesens,et al.  Financial Efficiency and Social Impact of Microfinance Institutions Using Self-Organizing Maps , 2013 .

[7]  Yunqian Ma,et al.  Imbalanced Datasets: From Sampling to Classifiers , 2013 .

[8]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[9]  Jie Jiang,et al.  Imbalanced target prediction with pattern discovery on clinical data repositories , 2017, BMC Medical Informatics and Decision Making.

[10]  W. Guan,et al.  Unsupervised learning technique identifies bronchiectasis phenotypes with distinct clinical characteristics. , 2016, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[11]  Andries Petrus Engelbrecht,et al.  An overview of clustering methods , 2007, Intell. Data Anal..

[12]  D. Yach,et al.  The global burden of chronic diseases: overcoming impediments to prevention and control. , 2004, JAMA.

[13]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[14]  John Shawe-Taylor,et al.  Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning , 2012, PloS one.

[15]  Gretchen A. Piatt,et al.  Developing and validating a diabetes database in a large health system. , 2007, Diabetes research and clinical practice.

[16]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[17]  T. López-Cuadrado,et al.  Use of explicit ICD9-CM codes to identify adult severe sepsis: impacts on epidemiological estimates , 2016, Critical Care.

[18]  Fouad Badran,et al.  Hierarchical clustering of self-organizing maps for cloud classification , 2000, Neurocomputing.

[19]  Yuhua Liu,et al.  A Machine Learning Methodology for Diagnosing Chronic Kidney Disease , 2020, IEEE Access.

[20]  P. Hain,et al.  Preventability of Early Readmissions at a Children’s Hospital , 2013, Pediatrics.

[21]  Amir Hussain,et al.  Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study , 2016, IEEE Access.

[22]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24]  Nan Liu,et al.  Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection , 2014, BMC Medical Informatics and Decision Making.

[25]  I. Mora-Jiménez,et al.  Clinical Risk Groups Analysis for Chronic Hypertensive Patients in Terms of ICD9-CM Diagnosis Codes , 2017, PhyCS.

[26]  Analysis of Nighttime Activity and Daytime Pain in Patients with Chronic Back Pain Using a Self-Organizing Map Neural Network , 2005, Journal of clinical monitoring and computing.

[27]  Ching-Hsue Cheng,et al.  Extracting drug utilization knowledge using self-organizing map and rough set theory , 2007, Expert Syst. Appl..

[28]  Michael Schmuker,et al.  SOMMER: self-organising maps for education and research , 2006, Journal of molecular modeling.

[29]  K. Bennett,et al.  Statins and risk of treated incident diabetes in a primary care population. , 2013, British journal of clinical pharmacology.

[30]  Berta Galán,et al.  Assessment of Self-Organizing Map artificial neural networks for the classification of sediment quality. , 2008, Environment international.

[31]  A. Becker,et al.  Reduction of the long-term use of proton pump inhibitors by a patient-oriented electronic decision support tool (arriba-PPI): study protocol for a randomized controlled trial , 2019, Trials.

[32]  H. Quan,et al.  Validating ICD coding algorithms for diabetes mellitus from administrative data. , 2010, Diabetes research and clinical practice.

[33]  Tara Gomes,et al.  Risk of incident diabetes among patients treated with statins: population based study , 2013, BMJ.

[34]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[35]  A. Viale,et al.  IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype , 2012, Nature.

[36]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[37]  A. Janeck,et al.  Obsessive-compulsive disorder subgroups: a symptom-based clustering approach. , 1999, Behaviour research and therapy.

[38]  Mhd Saeed Sharif,et al.  Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data , 2019, IEEE Access.

[39]  P. Andersen,et al.  Depression After First Hospital Admission for Acute Coronary Syndrome: A Study of Time of Onset and Impact on Survival. , 2016, American journal of epidemiology.

[40]  Lutgarde M. C. Buydens,et al.  Self- and Super-organizing Maps in R: The kohonen Package , 2007 .

[41]  M. Netto,et al.  An unsupervised method of classifying remotely sensed images using Kohonen self‐organizing maps and agglomerative hierarchical clustering methods , 2008 .

[42]  Tsvi Kuflik,et al.  Onto-clust - A methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders , 2009, J. Biomed. Informatics.

[43]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[44]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[45]  Young-Seuk Park,et al.  Application of a self-organizing map to select representative species in multivariate analysis: A case study determining diatom distribution patterns across France , 2006, Ecol. Informatics.

[46]  Jinmiao Huang,et al.  An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes , 2018, Comput. Methods Programs Biomed..

[47]  Kenneth D Mandl,et al.  Inpatient growth and resource use in 28 children's hospitals: a longitudinal, multi-institutional study. , 2013, JAMA pediatrics.

[48]  Fatimah Ibrahim,et al.  A noninvasive intelligent approach for predicting the risk in dengue patients , 2010, Expert Syst. Appl..

[49]  Mohamed Abdelrazek,et al.  An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction , 2018, IEEE Access.

[50]  Sheng-Tun Li,et al.  Clustering spatial-temporal precipitation data using wavelet transform and self-organizing map neural network , 2010 .

[51]  Jimeng Sun,et al.  Using recurrent neural network models for early detection of heart failure onset , 2016, J. Am. Medical Informatics Assoc..

[52]  K. Malloch,et al.  Patient classification systems, Part 1: The third generation. , 1999, Journal of Nursing Administration.

[53]  Christian Hennig,et al.  Recovering the number of clusters in data sets with noise features using feature rescaling factors , 2015, Inf. Sci..

[54]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[55]  A. Hingorani,et al.  Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis , 2009, The Lancet.

[56]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[57]  Paul E. Green,et al.  K-modes Clustering , 2001, J. Classif..

[58]  C. van Weel,et al.  Identifying people at risk for undiagnosed type 2 diabetes using the GP's electronic medical record. , 2007, Family practice.

[59]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[60]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[61]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[62]  Chin-Teng Lin,et al.  A review of clustering techniques and developments , 2017, Neurocomputing.

[63]  Bart Baesens,et al.  Failure prediction with self organizing maps , 2006, Expert Syst. Appl..

[64]  D. Chen,et al.  Breast cancer diagnosis using self-organizing map for sonography. , 2000, Ultrasound in medicine & biology.

[65]  I. Mora-Jiménez,et al.  On the Use of Decision Trees Based on Diagnosis and Drug Codes for Analyzing Chronic Patients , 2018, IWBBIO.

[66]  K. Della-Giustina,et al.  Medications in pregnancy and lactation. , 2003, Emergency medicine clinics of North America.

[67]  Jing Xia,et al.  Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical Data , 2018, IEEE Access.

[68]  Sophie Couffignal,et al.  An algorithm to identify patients with treated type 2 diabetes using medico-administrative data , 2011, BMC Medical Informatics Decis. Mak..

[69]  James C Gay,et al.  Identifying and classifying children with chronic conditions using administrative data with the clinical risk group classification system. , 2002, Ambulatory pediatrics : the official journal of the Ambulatory Pediatric Association.

[70]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[71]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  Pablo de Miguel-Bohoyo,et al.  Visually guided classification trees for analyzing chronic patients , 2020, BMC Bioinformatics.

[73]  Lipo Wang,et al.  Deep Learning Applications in Medical Image Analysis , 2018, IEEE Access.

[74]  Casey S. Greene,et al.  Semi-supervised learning of the electronic health record for phenotype stratification , 2016, J. Biomed. Informatics.

[75]  Alberto Sánchez,et al.  Scaled radial axes for interactive visual feature selection: A case study for analyzing chronic conditions , 2018, Expert Syst. Appl..

[76]  Stephen M. Anderson,et al.  The validity of using ICD-9 codes and pharmacy records to identify patients with chronic obstructive pulmonary disease , 2011, BMC health services research.

[77]  Laura M. Beskow,et al.  Research use of electronic health records: patients’ perspectives on contact by researchers , 2018, J. Am. Medical Informatics Assoc..

[78]  Maurizio Pisati,et al.  Mapping Patterns of Multiple Deprivation Using Self-Organising Maps: An Application to EU-SILC Data for Ireland , 2010 .

[79]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[80]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[81]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[82]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[83]  Liang Bai,et al.  A dissimilarity measure for the k-Modes clustering algorithm , 2012, Knowl. Based Syst..

[84]  B. Sibai,et al.  Diagnosis and management of gestational hypertension and preeclampsia. , 2003, Obstetrics and gynecology.

[85]  C. Nelson-Piercy,et al.  Management of hypertension before, during, and after pregnancy , 2004, Heart.

[86]  C. Bailey Biguanides and NIDDM , 1992, Diabetes Care.