Unsupervised classification of multi-omics data during cardiac remodeling using deep learning.

Integration of multi-omics in cardiovascular diseases (CVDs) presents high potentials for translational discoveries. By analyzing abundance levels of heterogeneous molecules over time, we may uncover biological interactions and networks that were previously unidentifiable. However, to effectively perform integrative analysis of temporal multi-omics, computational methods must account for the heterogeneity and complexity in the data. To this end, we performed unsupervised classification of proteins and metabolites in mice during cardiac remodeling using two innovative deep learning (DL) approaches. First, long short-term memory (LSTM)-based variational autoencoder (LSTM-VAE) was trained on time-series numeric data. The low-dimensional embeddings extracted from LSTM-VAE were then used for clustering. Second, deep convolutional embedded clustering (DCEC) was applied on images of temporal trends. Instead of a two-step procedure, DCEC performes a joint optimization for image reconstruction and cluster assignment. Additionally, we performed K-means clustering, partitioning around medoids (PAM), and hierarchical clustering. Pathway enrichment analysis using the Reactome knowledgebase demonstrated that DL methods yielded higher numbers of significant biological pathways than conventional clustering algorithms. In particular, DCEC resulted in the highest number of enriched pathways, suggesting the strength of its unified framework based on visual similarities. Overall, unsupervised DL is shown to be a promising analytical approach for integrative analysis of temporal multi-omics.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Yueming Ding,et al.  An efficient SNP system for mouse genome scanning and elucidating strain relationships. , 2004, Genome research.

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Lincoln Stein,et al.  Reactome pathway analysis: a high-performance in-memory approach , 2017, BMC Bioinformatics.

[6]  Aidong Zhang,et al.  Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[8]  David S. Wishart,et al.  SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database , 2013, Nucleic Acids Res..

[9]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[10]  En Zhu,et al.  Deep Clustering with Convolutional Autoencoders , 2017, ICONIP.

[11]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[12]  Edward Lau,et al.  Protein kinetic signatures of the remodeling heart following isoproterenol stimulation. , 2014, The Journal of clinical investigation.

[13]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[16]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[17]  Edward Lau,et al.  Omics, Big Data, and Precision Medicine in Cardiovascular Sciences. , 2018, Circulation research.

[18]  Zachary A. Szpiech,et al.  High-resolution network biology: connecting sequence with function , 2013, Nature Reviews Genetics.

[19]  R. Virmani,et al.  Apoptosis in myocytes in end-stage heart failure. , 1996, The New England journal of medicine.

[20]  Casey S. Greene,et al.  Unsupervised Feature Construction and Knowledge Extraction from Genome-Wide Assays of Breast Cancer with Denoising Autoencoders , 2014, Pacific Symposium on Biocomputing.

[21]  A. Saraste,et al.  Apoptosis in the heart. , 1997, The New England journal of medicine.

[22]  Konrad J. Karczewski,et al.  Integrative omics for health and disease , 2018, Nature Reviews Genetics.

[23]  Francesca N. Delling,et al.  Heart Disease and Stroke Statistics—2018 Update: A Report From the American Heart Association , 2018, Circulation.

[24]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[25]  Brian J. Bleakley,et al.  Integrated omics dissection of proteome dynamics during cardiac remodeling , 2018, Nature Communications.

[26]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[27]  E. Lin,et al.  Machine learning and systems genomics approaches for multi-omics data , 2017, Biomarker Research.

[28]  Ke Yang,et al.  BMI1 promotes cardiac fibrosis in ischemia-induced heart failure via the PTEN-PI3K/Akt-mTOR signaling pathway. , 2019, American journal of physiology. Heart and circulatory physiology.

[29]  Holger Husi,et al.  C/VDdb: A multi-omics expression profiling database for a knowledge-driven approach in cardiovascular disease (CVD) , 2018, bioRxiv.

[30]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[31]  J. Marioni,et al.  Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets , 2018, Molecular systems biology.

[32]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[33]  Nan Hu,et al.  Targeted deletion of PTEN in cardiomyocytes renders cardiac contractile dysfunction through interruption of Pink1-AMPK signaling and autophagy. , 2015, Biochimica et biophysica acta.

[34]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[35]  Robert Petryszak,et al.  Discovering and linking public omics data sets using the Omics Discovery Index , 2017, Nature Biotechnology.

[36]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[37]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[39]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[40]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[41]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[42]  Peipei Ping,et al.  Integrated Dissection of Cysteine Oxidative Post-translational Modification Proteome During Cardiac Hypertrophy. , 2018, Journal of proteome research.

[43]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[46]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[47]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[48]  Peipei Ping,et al.  Quantitative temporal analysis of protein dynamics in cardiac remodeling. , 2018, Journal of molecular and cellular cardiology.

[49]  Shannon L. Risacher,et al.  Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data , 2017, Briefings Bioinform..

[50]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[51]  Gabriele Multhoff,et al.  Integrative proteomics and targeted transcriptomics analyses in cardiac endothelial cells unravel mechanisms of long-term radiation-induced vascular dysfunction. , 2015, Journal of proteome research.

[52]  Tam V. Nguyen,et al.  Dual-layer kernel extreme learning machine for action recognition , 2017, Neurocomputing.

[53]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[54]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[55]  Edward Lau,et al.  A large dataset of protein dynamics in the mammalian heart proteome , 2016, Scientific Data.