Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records

Recent advancement in EHR-based (Electronic Health Record) systems has resulted in producing data at an unprecedented rate. The complex, growing, and high-dimensional data available in EHRs creates great opportunities for machine learning techniques such as clustering. Cluster analysis often requires dimension reduction to achieve efficient processing time and mitigate the curse of dimensionality. Given a wide range of techniques for dimension reduction and cluster analysis, it is not straightforward to identify which combination of techniques from both families leads to the desired result. The ability to derive useful and precise insights from EHRs requires a deeper understanding of the data, intermediary results, configuration parameters, and analysis processes. Although these tasks are often tackled separately in existing studies, we present a visual analytics (VA) system, called Visual Analytics for Cluster Analysis and Dimension Reduction of High Dimensional Electronic Health Records (VALENCIA), to address the challenges of high-dimensional EHRs in a single system. VALENCIA brings a wide range of cluster analysis and dimension reduction techniques, integrate them seamlessly, and make them accessible to users through interactive visualizations. It offers a balanced distribution of processing load between users and the system to facilitate the performance of high-level cognitive tasks in such a way that would be difficult without the aid of a VA system. Through a real case study, we have demonstrated how VALENCIA can be used to analyze the healthcare administrative dataset stored at ICES. This research also highlights what needs to be considered in the future when developing VA systems that are designed to derive deep and novel insights into EHRs.

[1]  Jie Zhai,et al.  Representation learning for clinical time series prediction tasks in electronic health records , 2019, BMC Medical Informatics Decis. Mak..

[2]  Chao Wang,et al.  iGPSe: A visual analytic system for integrative genomic based cancer patient stratification , 2014, BMC Bioinformatics.

[3]  Tom Ronan,et al.  Avoiding common pitfalls when clustering biological data , 2016, Science Signaling.

[4]  H. Kiers,et al.  Factorial k-means analysis for two-way data , 2001 .

[5]  Yuval Shahar,et al.  Exploration of patterns predicting renal damage in patients with diabetes type II using a visual temporal analysis laboratory , 2015, J. Am. Medical Informatics Assoc..

[6]  Guang Chen,et al.  Visual analytics in the pharmaceutical industry , 2004, IEEE Computer Graphics and Applications.

[7]  Fei Wang,et al.  Mining and exploring care pathways from electronic medical records with visual analytics , 2015, J. Biomed. Informatics.

[8]  Kwan-Liu Ma,et al.  An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data , 2019, IEEE Transactions on Visualization and Computer Graphics.

[9]  H. Hirschfeld A Connection between Correlation and Contingency , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[10]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[11]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[12]  Ireneous N. Soyiri,et al.  An overview of health forecasting , 2012, Environmental Health and Preventive Medicine.

[13]  Zhi-Hua Zhou,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  G J McLachlan,et al.  Cluster analysis and related techniques in medical research , 1992, Statistical methods in medical research.

[15]  Xianghua Xie,et al.  TimeCluster: dimension reduction applied to temporal data for visual analytics , 2019, The Visual Computer.

[16]  Sheikh S. Abdullah,et al.  Multiple Regression Analysis and Frequent Itemset Mining of Electronic Medical Records: A Visual Analytics Approach Using VISA_M3R3 , 2020, Data.

[17]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[18]  AbdiHervé,et al.  Principal Component Analysis , 2010, Essentials of Pattern Recognition.

[19]  Eva Ceulemans,et al.  Factorial and reduced K-means reconsidered , 2010, Comput. Stat. Data Anal..

[20]  Kamran Sedig,et al.  Transactions on Human-computer Interaction Thci Design for Complex Cognitive Activities with Visual Representations: a Pattern-based Approach Sedig and Parsons Interaction Design for Complex Cognitive Activities with Visualizations , 2022 .

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[23]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[24]  H. Abdi,et al.  Principal component analysis , 2010 .

[25]  Kenneth Gersing,et al.  VisualDecisionLinc: A visual analytics approach for comparative effectiveness-based clinical decision support in psychiatry , 2012, J. Biomed. Informatics.

[26]  Ann Blandford,et al.  Making sense of personal health information: Challenges for information visualization , 2013, Health Informatics J..

[27]  Silvia Miksch,et al.  Visualization methods for data analysis and planning in medical applications , 2002, Int. J. Medical Informatics.

[28]  Chris North,et al.  Towards a Systematic Combination of Dimension Reduction and Clustering in Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[29]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[30]  Daniel A. Keim,et al.  Visual analytics: how much visualization and how much analytics? , 2010, SKDD.

[31]  Helwig Hauser,et al.  Visualization and Visual Analysis of Multifaceted Scientific Data: A Survey , 2013, IEEE Transactions on Visualization and Computer Graphics.

[32]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[33]  J. Valderas,et al.  Multimorbidity Patterns in Elderly Primary Health Care Patients in a South Mediterranean European Region: A Cluster Analysis , 2015, PloS one.

[34]  Wenqiang Cui,et al.  Visual Analytics: A Comprehensive Overview , 2019, IEEE Access.

[35]  F. Kianifard,et al.  Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis , 2016, BMC Nephrology.

[36]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[37]  Jérôme Pagès,et al.  Multiple factor analysis (AFMULT package) , 1994 .

[38]  Dong Hyun Jeong,et al.  Designing a collaborative visual analytics system to support users’ continuous analytical processes , 2015, Human-centric Computing and Information Sciences.

[39]  Ramzi A. Haraty,et al.  An Enhanced k-Means Clustering Algorithm for Pattern Discovery in Healthcare Data , 2015, Int. J. Distributed Sens. Networks.

[40]  Hiroshi Yadohisa,et al.  Reduced k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-means clustering with MCA in a low-dimensio , 2014, Computational Statistics.

[41]  Jeffrey G. Klann,et al.  A clustering approach for detecting implausible observation values in electronic health records data , 2019, bioRxiv.

[42]  Kamran Sedig,et al.  The Challenge of Big Data in Public Health: An Opportunity for Visual Analytics , 2014, Online journal of public health informatics.

[43]  Kamran Sedig,et al.  VINCENT: A visual analytics system for investigating the online vaccine debate. , 2019, Online journal of public health informatics.

[44]  James A. Wise The ecological approach to text visualization , 1999 .

[45]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[46]  Fleur Fritz,et al.  Electronic health records to facilitate clinical research , 2016, Clinical Research in Cardiology.

[47]  D. Cole,et al.  A systematic review of the application and utility of geographical information systems for exploring disease-disease relationships in paediatric global health research: the case of anaemia and malaria , 2013, International Journal of Health Geographics.

[48]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[49]  Joseph N. Khamalah,et al.  Using Cluster Analysis for Medical Resource Decision Making , 1995, Medical decision making : an international journal of the Society for Medical Decision Making.

[50]  Maurizio Vichi,et al.  A New Dimension Reduction Method: Factor Discriminant K-means , 2011, J. Classif..

[51]  Jeffrey Heer,et al.  Online Submission ID : 0 Orion : A System for Modeling , Transformation and Visualization of Multidimensional Heterogeneous Networks , 2012 .

[52]  James J. Thomas,et al.  Defining Insight for Visual Analytics , 2009, IEEE Computer Graphics and Applications.

[53]  Noreen Kamal,et al.  Big Data and Visual Analytics in Health and Medicine: From Pipe Dream to Reality , 2014 .

[54]  Andrew Kusiak,et al.  Feature transformation methods in data mining , 2001 .

[55]  Jesus J. Caban,et al.  Visual analytics in healthcare - opportunities and research challenges , 2015, J. Am. Medical Informatics Assoc..

[56]  Markus Wagner,et al.  Towards a Structural Framework for Explicit Domain Knowledge in Visual Analytics , 2019, 2019 IEEE Workshop on Visual Analytics in Healthcare (VAHC).

[57]  Shun Adachi,et al.  Rigid geometry solves “curse of dimensionality” effects in clustering methods: An application to omics data , 2016, bioRxiv.

[58]  Dieter Schmalstieg,et al.  Comparative Analysis of Multidimensional, Quantitative Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[59]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[60]  Jorge A. Gálvez,et al.  Optimization of drug-drug interaction alert rules in a pediatric hospital's electronic health record system using a visual analytics dashboard , 2015, J. Am. Medical Informatics Assoc..

[61]  Fei Wang,et al.  PhenoTree: Interactive Visual Analytics for Hierarchical Phenotyping From Large-Scale Electronic Health Records , 2016, IEEE Transactions on Multimedia.