PhenoTree: Interactive Visual Analytics for Hierarchical Phenotyping From Large-Scale Electronic Health Records

Electronic health records (EHRs) capture comprehensive patient information in digital form from a variety of sources. Increasing availability of EHRs has facilitated development of data and visual analytic tools for healthcare analytics, such as clinical decision support and patient care management systems. Many healthcare analytic tools are used to investigate fundamental problems, such as study of patient population, exploring complicated interactions among patients and their medical histories, and extracting structured phenotypes characterizing the patient population. In this paper, we propose PHENOTREE, a novel data-driven, hierarchical, and interactive phenotyping tool, that enables physicians and medical researchers to participate in the phenotyping process of large-scale EHR cohorts. The proposed visual analytic tool allows users to interactively explore EHR cohorts, and generate, interpret, evaluate, and refine phenotypes by building and navigating a phenotype hierarchy. Specifically, given a cohort or subcohort, PHENOTREE employs sparse principal component analysis (SPCA) to identify key clinical features that characterize the population. The clinical features provide a natural way to generate deeper phenotypes at finer granularities by expanding the phenotype hierarchy. To facilitate the intensive computation required for interactive analytics, we design an efficient SPCA solver based on a variance reduced stochastic gradient technique. The benefits of our method are demonstrated by analyzing two different EHR patient cohorts, a public and a private dataset containing EHRs of 101 767 and 223 076 patients, respectively. Our evaluations show that PHENOTREE can detect clinically meaningful hierarchical phenotypes.

[1]  Peter Dayan,et al.  Computational Phenotyping of Two-Person Interactions Reveals Differential Neural Response to Depth-of-Thought , 2012, PLoS Comput. Biol..

[2]  E. Hing,et al.  Use and characteristics of electronic health record systems among office-based physician practices: United States, 2001-2012. , 2012, NCHS data brief.

[3]  Matthias Hein,et al.  An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA , 2010, NIPS.

[4]  Yan Liu,et al.  Deep Computational Phenotyping , 2015, KDD.

[5]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[6]  Allen Y. Yang,et al.  Informative feature selection for object recognition via Sparse PCA , 2011, 2011 International Conference on Computer Vision.

[7]  Elad Hazan,et al.  Fast and Simple PCA via Convex Optimization , 2015, ArXiv.

[8]  Kaija Saranto,et al.  Definition, structure, content, use and impacts of electronic health records: A review of the research literature , 2008, Int. J. Medical Informatics.

[9]  Fei Wang,et al.  Stochastic convex sparse principal component analysis , 2016, EURASIP J. Bioinform. Syst. Biol..

[10]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[11]  Kwan-Liu Ma,et al.  A visual analysis approach to cohort study of electronic patient records , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[12]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[13]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[14]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[15]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[16]  Fei Wang,et al.  Mining and exploring care pathways from electronic medical records with visual analytics , 2015, J. Biomed. Informatics.

[17]  Benjamin M. Marlin,et al.  Unsupervised pattern discovery in electronic health care data using probabilistic clustering models , 2012, IHI '12.

[18]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[19]  Y. Lussier,et al.  Computational approaches to phenotyping: high-throughput phenomics. , 2007, Proceedings of the American Thoracic Society.

[20]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[21]  Stephen P. Boyd,et al.  Subgradient Methods , 2007 .

[22]  E. Hing,et al.  Use and characteristics of electronic health record systems among office-based physician practices: United States, 2001-2013. , 2014, NCHS data brief.

[23]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[24]  Kwan-Liu Ma,et al.  A richly interactive exploratory data analysis and visualization tool using electronic medical records , 2015, BMC Medical Informatics and Decision Making.

[25]  R. Tracy ‘Deep phenotyping’: characterizing populations in the era of genomics and systems biology , 2008, Current opinion in lipidology.

[26]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[27]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  Fei Wang,et al.  A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data , 2014, J. Biomed. Informatics.

[30]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[31]  Zhaoran Wang,et al.  Sparse PCA with Oracle Property , 2014, NIPS.