Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization

OBJECTIVE Our aim is to extract clinically-meaningful phenotypes from longitudinal electronic health records (EHRs) of medically-complex children. This is a fragile set of patients consuming a disproportionate amount of pediatric care resources but who often end up with sub-optimal clinical outcome. The rise in available electronic health records (EHRs) provide a rich data source that can be used to disentangle their complex clinical conditions into concise, clinically-meaningful groups of characteristics. We aim at identifying those phenotypes and their temporal evolution in a scalable, computational manner, which avoids the time-consuming manual chart review. MATERIALS AND METHODS We analyze longitudinal EHRs from Children's Healthcare of Atlanta including 1045 medically complex patients with a total of 59,948 encounters over 2 years. We apply a tensor factorization method called PARAFAC2 to extract: (a) clinically-meaningful groups of features (b) concise patient representations indicating the presence of a phenotype for each patient, and (c) temporal signatures indicating the evolution of those phenotypes over time for each patient. RESULTS We identified four medically complex phenotypes, namely gastrointestinal disorders, oncological conditions, blood-related disorders, and neurological system disorders, which have distinct clinical characterizations among patients. We demonstrate the utility of patient representations produced by PARAFAC2, towards identifying groups of patients with significant survival variations. Finally, we showcase representative examples of the temporal phenotypic trends extracted for different patients. DISCUSSION Unsupervised temporal phenotyping is an important task since it minimizes the burden on behalf of clinical experts, by relegating their involvement in the output phenotypes' validation. PARAFAC2 enjoys several compelling properties towards temporal computational phenotyping: (a) it is able to handle high-dimensional data and variable numbers of encounters across patients, (b) it has an intuitive interpretation and (c) it is free from ad-hoc parameter choices. Computational phenotypes, such as the ones computed by our approach, have multiple applications; we highlight three of them which are particularly useful for medically complex children: (1) integration into clinical decision support systems, (2) interpretable mortality prediction and 3) clinical trial recruitment. CONCLUSION PARAFAC2 can be applied to unsupervised temporal phenotyping tasks where precise definitions of different phenotypes are absent, and lengths of patient records are varying.

[1]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[2]  Fei Wang,et al.  SPARTan: Scalable PARAFAC2 for Large & Sparse Data , 2017, KDD.

[3]  Xiang Wang,et al.  Unsupervised learning of disease progression models , 2014, KDD.

[4]  Li Li,et al.  Automated disease cohort selection using word embeddings from Electronic Health Records , 2018, PSB.

[5]  Jamie R. Robinson,et al.  Defining Phenotypes from Clinical Data to Drive Genomic Research. , 2018, Annual Review of Biomedical Data Science.

[6]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[7]  Jimeng Sun,et al.  Multi-layer Representation Learning for Medical Concepts , 2016, KDD.

[8]  Peter Szolovits,et al.  A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data , 2015, AAAI.

[9]  Riccardo Bellazzi,et al.  Careflow Mining Techniques to Explore Type 2 Diabetes Evolution , 2018, Journal of diabetes science and technology.

[10]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[11]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[12]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Adler J. Perotte,et al.  Learning probabilistic phenotypes from heterogeneous EHR data , 2015, J. Biomed. Informatics.

[15]  George Hripcsak,et al.  Physics of the Medical Record: Handling Time in Health Record Studies , 2015, AIME.

[16]  James E. Levin,et al.  How Well Can Hospital Readmission Be Predicted in a Cohort of Hospitalized Children? A Retrospective, Multicenter Study , 2009, Pediatrics.

[17]  Fei Wang,et al.  DensityTransfer: A Data Driven Approach for Imputing Electronic Health Records , 2014, 2014 22nd International Conference on Pattern Recognition.

[18]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[19]  Jimeng Sun,et al.  SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping , 2018, KDD.

[20]  Bryan L Stone,et al.  Children With Complex Chronic Conditions in Inpatient Hospital Settings in the United States , 2010, Pediatrics.

[21]  Claus A. Andersson,et al.  PARAFAC2—Part II. Modeling chromatographic data with retention time shifts , 1999 .

[22]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[23]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[24]  Jimeng Sun,et al.  Sparse Hierarchical Tucker Factorization and Its Application to Healthcare , 2015, 2015 IEEE International Conference on Data Mining.

[25]  Jack P. Shonkoff,et al.  An Epidemiologic Profile of Children With Special Health Care Needs , 1998, Pediatrics.

[26]  Riccardo Bellazzi,et al.  Temporal electronic phenotyping by mining careflows of breast cancer patients , 2017, J. Biomed. Informatics.

[27]  Elena M Andresen,et al.  Meeting the health care needs of persons with disabilities. , 2002, The Milbank quarterly.

[28]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[29]  Madeleine Udell,et al.  Discovering Patient Phenotypes Using Generalized Low Rank Models , 2016, PSB.

[30]  R. Bro,et al.  PARAFAC2—Part I. A direct fitting algorithm for the PARAFAC2 model , 1999 .

[31]  R. Bro,et al.  Core consistency diagnostic in PARAFAC2 , 2013 .

[32]  George Hripcsak,et al.  Parameterizing time in electronic health record studies , 2015, J. Am. Medical Informatics Assoc..

[33]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[34]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[35]  P. Casey,et al.  Increasing Prevalence of Medically Complex Children in US Hospitals , 2010, Pediatrics.

[36]  Christos Faloutsos,et al.  Fast efficient and scalable Core Consistency Diagnostic for the parafac decomposition for big sparse tensors , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Suchi Saria,et al.  Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery , 2015, AAAI.

[38]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[39]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[40]  Joydeep Ghosh,et al.  Identifiable Phenotyping using Constrained Non-Negative Matrix Factorization , 2016, MLHC.

[41]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[42]  Iain Buchan,et al.  Developmental Profiles of Eczema , Wheeze , and Rhinitis : Two Population-Based Birth Cohort Studies , 2014 .

[43]  Jimeng Sun,et al.  Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.

[44]  Jimeng Sun,et al.  Limestone: High-throughput candidate phenotype generation via tensor factorization , 2014, J. Biomed. Informatics.

[45]  Chunhua Weng,et al.  Case Report: Electronic Screening Improves Efficiency in Clinical Trial Recruitment , 2009, J. Am. Medical Informatics Assoc..

[46]  Quan Ding,et al.  Temporal phenome analysis of a large electronic health record cohort enables identification of hospital-acquired complications. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[47]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..