Integrating hypertension phenotype and genotype with hybrid non‐negative matrix factorization

Motivation Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements with the goal of identifying subtypes of patients who share similar pathophysiologic mechanisms and may respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification based on phenotype and genotype features. Results In this article, we present a hybrid non‐negative matrix factorization (HNMF) method to integrate phenotype and genotype information for patient stratification. HNMF simultaneously approximates the phenotypic and genetic feature matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback‐Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On a real‐world clinical dataset, we used the patient factor matrix as features and examined the association of these features with indices of cardiac mechanics. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss We also compared HNMF with 3 recently published methods for integrative clustering analysis, including iClusterBayes, Bayesian joint analysis and JIVE. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype‐genotype interactions that characterize cardiac abnormalities. Availability and implementation Our code is publicly available on github at https://github.com/yuanluo/hnmf. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[3]  C. Guillemette,et al.  Nomenclature update for the mammalian UDP glycosyltransferase (UGT) gene superfamily. , 2005, Pharmacogenetics and genomics.

[4]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[6]  R. Shamir,et al.  Regulatory networks define phenotypic classes of human stem cell lines , 2008, Nature.

[7]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[8]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[9]  Xing-Ming Zhao,et al.  jNMFMA: a joint non-negative matrix factorization meta-analysis of transcriptomics data , 2015, Bioinform..

[10]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[11]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[12]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[13]  Jimeng Sun,et al.  Phenotyping using Structured Collective Matrix Factorization of Multi--source EHR Data , 2016, 1609.04466.

[14]  Victor Mor-Avi,et al.  Current and evolving echocardiographic techniques for the quantitative evaluation of cardiac mechanics: ASE/EAE consensus statement on methodology and indications endorsed by the Japanese Society of Echocardiography. , 2011, Journal of the American Society of Echocardiography : official publication of the American Society of Echocardiography.

[15]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[16]  Lawrence Carin,et al.  Bayesian joint analysis of heterogeneous genomics data , 2014, Bioinform..

[17]  M A Province,et al.  NHLBI family blood pressure program: methodology and recruitment in the HyperGEN network. Hypertension genetic epidemiology network. , 2000, Annals of epidemiology.

[18]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[19]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[20]  Shuo Yang,et al.  Integrative variants, haplotypes and diplotypes of the CAPN3 and FRMD5 genes and several environmental exposures associate with serum lipid variables , 2017, Scientific Reports.

[21]  V. Genkel,et al.  Conceptualization of Heterogeneity of Chronic Diseases and Atherosclerosis as a Pathway to Precision Medicine: Endophenotype, Endotype, and Residual Cardiovascular Risk , 2020, International journal of chronic diseases.

[22]  Edmund E Wilkes,et al.  Using machine learning to predict laboratory test results , 2016, Annals of clinical biochemistry.

[23]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[24]  Jaegul Choo,et al.  Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization , 2015, KDD.

[25]  Peter Szolovits,et al.  Predicting ICU Mortality Risk by Grouping Temporal Trends from a Multivariate Panel of Physiologic Measurements , 2016, AAAI.

[26]  Sanjiv J. Shah,et al.  A Test in Context: E/A and E/e' to Assess Diastolic Dysfunction and LV Filling Pressure. , 2017, Journal of the American College of Cardiology.

[27]  G. Fonarow,et al.  Epidemiology and risk profile of heart failure , 2011, Nature Reviews Cardiology.

[28]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[29]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[30]  Haesun Park,et al.  Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons , 2011, SIAM J. Sci. Comput..

[31]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[32]  Marina Vannucci,et al.  A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. , 2018, Biostatistics.

[33]  Isaac S Kohane,et al.  Ten things we have to do to achieve precision medicine , 2015, Science.

[34]  Yuan Luo,et al.  Integrating hypertension phenotype and genotype with hybrid non-negative matrix factorization. , 2019, Bioinformatics.

[35]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[36]  E. Burnside,et al.  New Genetic Variants Improve Personalized Breast Cancer Diagnosis , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[37]  Sanjiv J Shah,et al.  Association of Central Adiposity With Adverse Cardiac Mechanics: Findings From the Hypertension Genetic Epidemiology Network Study. , 2016, Circulation. Cardiovascular imaging.

[38]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD) and Its Exploitation in the Fields of Personalized Genomics and Molecular Evolution , 2012, Current protocols in bioinformatics.

[39]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[40]  Rahul C. Deo,et al.  Phenomapping for the Identification of Hypertensive Patients with the Myocardial Substrate for Heart Failure with Preserved Ejection Fraction , 2017, Journal of Cardiovascular Translational Research.

[41]  J. Sebat,et al.  Getting to the Cores of Autism , 2019, Cell.

[42]  P. Spellman,et al.  Subtypes of Pancreatic Ductal Adenocarcinoma and Their Differing Responses to Therapy , 2011, Nature Medicine.

[43]  Yuan Luo,et al.  Cancer classification and pathway discovery using non-negative matrix factorization , 2018, Journal of biomedical informatics.

[44]  Sanjiv J Shah,et al.  Ultrastructural and cellular basis for the development of abnormal myocardial mechanics during the transition from hypertension to heart failure. , 2014, American journal of physiology. Heart and circulatory physiology.

[45]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[46]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[47]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.