A multi-omics supervised autoencoder for pan-cancer clinical outcome endpoints prediction

Background With the rapid development of sequencing technologies, collecting diverse types of cancer omics data become more cost-effective. Many computational methods attempted to represent and fuse multiple omics into a comprehensive view of cancer. However, different types of omics are related and heterogeneous . Most of the existing methods do not consider the difference between omics, so the biological knowledge of individual omics may not be fully excavated. And for a given task (e.g. predicting overall survival), these methods prefer to use sample similarity or domain knowledge to learn a more reasonable representation of omics, but it’s not enough. Methods For the purpose of learning more useful representation for individual omics and fusing them to improve the prediction ability, we proposed an autoencoder-based method named MOSAE (Multi-omics Supervised Autoencoder). In our method, a specific autoencoder were designed for each omics according to their size of dimension to generate omics-specific representations. Then, a supervised autoencoder was constructed based on specific autoencoder by using labels to enforce each specific autoencoder to learn both omics-specific and task-specific representations. Finally, representations of different omics that generate from supervised autoencoders were fused in a traditional but powerful way, and the fused representation was used for subsequent predictive tasks. Results We applied our method over TCGA Pan-Cancer dataset to predict four different clinical outcome endpoints (OS, PFI, DFI, and DSS). Compared with traditional and state-of-the-art methods, MOSAE achieved better predictive performance. We also tested the effects of each improvement, which all have a positive effect on predictive performance. Conclusions Predicting clinical outcome endpoints are very important for precision medicine and personalized medicine. And multi-omics fusion is an effective way to solve this problem. MOSAE is a powerful multi-omics fusion method, which can generate both omics-specific and task-specific representation for given endpoint predictive tasks and improve the predictive performance.

[1]  Aedín C. Culhane,et al.  Dimension reduction techniques for the integrative analysis of multi-omics data , 2016, Briefings Bioinform..

[2]  Aidong Zhang,et al.  Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  Holger Fröhlich,et al.  netClass: an R-package for network based, integrative biomarker signature discovery , 2014, Bioinform..

[4]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[5]  Ignacio González,et al.  integrOmics: an R package to unravel relationships between two omics datasets , 2009, Bioinform..

[6]  Lana X. Garmire,et al.  Deep Learning based multi-omics integration robustly predicts survival in liver cancer , 2017, bioRxiv.

[7]  Xinghua Lu,et al.  Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model , 2016, BMC Bioinformatics.

[8]  Bo Yang,et al.  Deep Subspace Similarity Fusion for the Prediction of Cancer Subtypes , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[10]  Kumardeep Chaudhary,et al.  Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer , 2017, Clinical Cancer Research.

[11]  Shi-Hua Zhang,et al.  Identifying multi-layer gene regulatory modules from multi-dimensional genomic data , 2012, Bioinform..

[12]  Luciano Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[13]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..

[14]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[15]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[16]  Adrian V. Lee,et al.  An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics , 2018, Cell.

[17]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[18]  Aidong Zhang,et al.  Integrate multi-omic data using affinity network fusion (ANF) for cancer patient clustering , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).