Joint Learning of Phenotypes and Diagnosis-Medication Correspondence via Hidden Interaction Tensor Factorization

Non-negative tensor factorization has been shown effective for discovering phenotypes from the EHR data with minimal human supervision. In most cases, an interaction tensor of the elements in the EHR (e.g., diagnoses and medications) has to be first established before the factorization can be applied. Such correspondence information however is often missing. While different heuristics can be used to estimate the missing correspondence, any errors introduced will in turn cause inaccuracy for the subsequent phenotype discovery task. This is especially true for patients with multiple diseases diagnosed (e.g., under critical care). To alleviate this limitation, we propose the hidden interaction tensor factorization (HITF) where the diagnosis-medication correspondence and the underlying phenotypes are inferred simultaneously. We formulate it under a Poisson non-negative tensor factorization framework and learn the HITF model via maximum likelihood estimation. For performance evaluation, we applied HITF to the MIMIC III dataset. Our empirical results show that both the phenotypes and the correspondence inferred are clinically meaningful. In addition, the inferred HITF model outperforms a number of stateof-the-art methods for mortality prediction.

[1]  Werner Dubitzky,et al.  Briefings in bioinformatics. , 2009, Briefings in bioinformatics.

[2]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[3]  Hwanjo Yu,et al.  Discriminative and Distinct Phenotyping by Constrained Tensor Factorization , 2017, Scientific Reports.

[4]  Ian Davidson,et al.  Proceedings of the 2012 SIAM International Conference on Data Mining , 2012 .

[5]  Jimeng Sun,et al.  Phenotyping using Structured Collective Matrix Factorization of Multi--source EHR Data , 2016, 1609.04466.

[6]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[7]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[8]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[9]  Fei Wang,et al.  Tensor factorization toward precision medicine , 2016, Briefings Bioinform..

[10]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[11]  Peter Szolovits,et al.  Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources , 2015, J. Am. Medical Informatics Assoc..

[12]  Jimeng Sun,et al.  Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.

[13]  Jimeng Sun,et al.  Limestone: High-throughput candidate phenotype generation via tensor factorization , 2014, J. Biomed. Informatics.

[14]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[15]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[16]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[17]  Fei Wang,et al.  TaGiTeD: Predictive Task Guided Tensor Decomposition for Representation Learning from Electronic Health Records , 2017, AAAI.