Identifying temporal molecular signatures underlying cardiovascular diseases: A data science platform

OBJECTIVE During cardiovascular disease progression, molecular systems of myocardium (e.g., a proteome) undergo diverse and distinct changes. Dynamic, temporally-regulated alterations of individual molecules underlie the collective response of the heart to pathological drivers and the ultimate development of pathogenesis. Advances in high-throughput omics technologies have enabled cost-effective, temporal profiling of targeted systems in animal models of human disease. However, computational analysis of temporal patterns from omics data remains challenging. In particular, bioinformatic pipelines involving unsupervised statistical approaches to support cardiovascular investigations are lacking, which hinders one's ability to extract biomedical insights from these complex datasets. APPROACH AND RESULTS We developed a non-parametric data analysis platform to resolve computational challenges unique to temporal omics datasets. Our platform consists of three modules. Module I preprocesses the temporal data using either cubic splines or principal component analysis (PCA), and it simultaneously accomplishes the tasks on missing data imputation and denoising. Module II performs an unsupervised classification by K-means or hierarchical clustering. Module III evaluates and identifies biological entities (e.g., molecular events) that exhibit strong associations to specific temporal patterns. The jackstraw method for cluster membership has been applied to estimate p-values and posterior inclusion probabilities (PIPs), both of which guided feature selection. To demonstrate the utility of the analysis platform, we employed a temporal proteomics dataset that captured the proteome-wide dynamics of oxidative stress induced post-translational modifications (O-PTMs) in mouse hearts undergoing isoproterenol (ISO)-induced hypertrophy. CONCLUSION We have created a platform, CV.Signature.TCP, to identify distinct temporal clusters in omics datasets. We presented a cardiovascular use case to demonstrate its utility in unveiling biological insights underlying O-PTM regulations in cardiac remodeling. This platform is implemented in an open source R package (https://github.com/UCLA-BD2K/CV.Signature.TCP).

[1]  I. Jolliffe Principal Component Analysis , 2002 .

[2]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[3]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[4]  Daniel B. McClatchy,et al.  Pulsed Azidohomoalanine Labeling in Mammals (PALM) Detects Changes in Liver-Specific LKB1 Knockout Mice , 2015, Journal of proteome research.

[5]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[6]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[7]  T. Vondriska,et al.  Metabolism, Epigenetics, and Causal Inference in Heart Failure , 2019, Trends in Endocrinology & Metabolism.

[8]  Neo Christopher Chung,et al.  Statistical significance of cluster membership for unsupervised evaluation of cell identities , 2020, Bioinform..

[9]  T. Minamino,et al.  Physiological and pathological cardiac hypertrophy. , 2016, Journal of molecular and cellular cardiology.

[10]  Jennifer E Van Eyk,et al.  Chasing Cysteine Oxidative Modifications: Proteomic Tools for Characterizing Cysteine Redox Status , 2012, Circulation. Cardiovascular genetics.

[11]  Katie J. Clowers,et al.  Nudt21 Controls Cell Fate by Connecting Alternative Polyadenylation to Chromatin Signaling , 2018, Cell.

[12]  Brian J. Bleakley,et al.  Integrated omics dissection of proteome dynamics during cardiac remodeling , 2018, Nature Communications.

[13]  I. W. Wright Splines in Statistics , 1983 .

[14]  Peipei Ping,et al.  Integrated Dissection of Cysteine Oxidative Post-translational Modification Proteome During Cardiac Hypertrophy. , 2018, Journal of proteome research.

[15]  John D. Storey,et al.  Statistical significance of variables driving systematic variation in high-dimensional data , 2013, Bioinform..