Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication

Lung cancer is the most occurring cancer type, and its mortality rate is also the highest, among them lung adenocarcinoma (LUAD) accounts for about 40 % of lung cancer. There is an urgent need to develop a prognosis prediction model for lung adenocarcinoma. Previous LUAD prognosis studies only took single-omics data, such as mRNA or miRNA, into consideration. To this end, we proposed a deep learning-based autoencoding approach for combination of four-omics data, mRNA, miRNA, DNA methylation and copy number variations, to construct an autoencoder model, which learned representative features to differentiate the two optimal patient subgroups with a significant difference in survival (P = 4.08e-09) and good consistency index (C-index = 0.65). The multi-omics model was validated though four independent datasets, i.e. GSE81089 for mRNA (n = 198, P = 0.0083), GSE63805 for miRNA (n = 32, P = 0.018), GSE63384 for DNA methylation (n = 35, P = 0.009), and TCGA independent samples for copy number variations (n = 94, P = 0.0052). Finally, a functional analysis was performed on two survival subgroups to discover genes involved in biological processes and pathways. This is the first study incorporating deep autoencoding and four-omics data to construct a robust survival prediction model, and results show the approach is useful at predicting LUAD prognostication.

[1]  Qiang Wang,et al.  Liver X receptors as potential targets for cancer therapeutics. , 2017, Oncology letters.

[2]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[3]  F-H Wang,et al.  UPK1B promotes the invasion and metastasis of bladder cancer via regulating the Wnt/β-catenin pathway. , 2018, European review for medical and pharmacological sciences.

[4]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[5]  Feng Liu,et al.  PEDLA: predicting enhancers with a deep learning-based algorithmic framework , 2016, Scientific Reports.

[6]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[7]  Maarit Tiirikainen,et al.  Genome-scale hypomethylation in the cord blood DNAs associated with early onset preeclampsia , 2015, Clinical Epigenetics.

[8]  Ash A. Alizadeh,et al.  Abstract PR09: The prognostic landscape of genes and infiltrating immune cells across human cancers , 2015 .

[9]  Xiaoqiang Qiu,et al.  Identification of differentially expressed miRNAs in early-stage cervical cancer with lymph node metastasis across The Cancer Genome Atlas datasets , 2018, Cancer management and research.

[10]  Jason M. Sheltzer,et al.  Systematic identification of mutations and copy number alterations associated with cancer patient prognosis , 2018, eLife.

[11]  Maarit Tiirikainen,et al.  Genome-wide hypermethylation coupled with promoter hypomethylation in the chorioamniotic membranes of early onset pre-eclampsia. , 2014, Molecular human reproduction.

[12]  Andrea Califano,et al.  Epigenetic Regulation of ZBTB18 Promotes Glioblastoma Progression , 2017, Molecular Cancer Research.

[13]  Xiaohong Wu,et al.  Transgelin overexpression in lung adenocarcinoma is associated with tumor progression. , 2014, International journal of molecular medicine.

[14]  Sambasivarao Damaraju,et al.  Germline copy number variations are associated with breast cancer risk and prognosis , 2017, Scientific Reports.

[15]  K. Sugimachi,et al.  K-ras and p53 mutations are an independent unfavourable prognostic indicator in patients with non-small-cell lung cancer. , 1997, British Journal of Cancer.

[16]  Tieliu Shi,et al.  Deep Learning-Based Multi-Omics Data Integration Reveals Two Prognostic Subtypes in High-Risk Neuroblastoma , 2018, Front. Genet..

[17]  K. Aird,et al.  Deoxyribonucleotide Triphosphate Metabolism in Cancer and Metabolic Disease , 2018, Front. Endocrinol..

[18]  Li Yin,et al.  G2/M checkpoint plays a vital role at the early stage of HCC by analysis of key pathways and genes , 2017, Oncotarget.

[19]  Jack Cuzick,et al.  Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death , 2018, eLife.

[20]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[21]  Weijun Ou,et al.  TP53 mutation is associated with a poor clinical outcome for non-small cell lung cancer: Evidence from a meta-analysis , 2016, Molecular and clinical oncology.

[22]  Youngchul Kim,et al.  Impact of viral presence in tumor on gene expression in non-small cell lung cancer , 2018, BMC cancer.

[23]  A. Onn,et al.  A better crystal ball to predict lung-cancer survival? , 2006, The Lancet. Oncology.

[24]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[25]  Hongmei Ding,et al.  Overexpression of long non-coding RNA MFI2 promotes cell proliferation and suppresses apoptosis in human osteosarcoma. , 2016, Oncology reports.

[26]  Biao He,et al.  A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies , 2012, The Lancet.

[27]  Marylyn D. Ritchie,et al.  Using knowledge-driven genomic interactions for multi-omics data analysis: metadimensional models for predicting clinical outcomes in ovarian carcinoma , 2017, J. Am. Medical Informatics Assoc..

[28]  W. Travis,et al.  Pathology of lung cancer. , 2011, Clinics in chest medicine.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[31]  Anders Krogh,et al.  miR-449 inhibits cell proliferation and is down-regulated in gastric cancer , 2011, Molecular Cancer.

[32]  Xiaoping Xiao,et al.  Overexpression of SSTR2 inhibited the growth of SSTR2-positive tumors via multiple signaling pathways , 2009, Acta oncologica.

[33]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[34]  Aichun Liu,et al.  Identification of key microRNAs associated with diffuse large B-cell lymphoma by analyzing serum microRNA expressions. , 2018, Gene.

[35]  Andreas Krämer,et al.  Causal analysis approaches in Ingenuity Pathway Analysis , 2013, Bioinform..

[36]  Thomas M. Harris,et al.  Low-level expression of miR-375 correlates with poor outcome and metastasis while altering the invasive properties of head and neck squamous cell carcinomas. , 2012, The American journal of pathology.

[37]  J. Clarhaut,et al.  Serotonin and cancer: what is the link? , 2015, Current molecular medicine.

[38]  Pingzhao Hu,et al.  Association Analysis of Somatic Copy Number Alteration Burden With Breast Cancer Survival , 2018, Front. Genet..

[39]  W. Alberts,et al.  Lung cancer screening: advantages, controversies, and applications. , 2014, Cancer control : journal of the Moffitt Cancer Center.

[40]  Sergey V. Kostrov,et al.  Alterations in Gene Expression of Proprotein Convertases in Human Lung Cancer Have a Limited Number of Scenarios , 2013, PloS one.

[41]  Yi Li,et al.  Krt6a-Positive Mammary Epithelial Progenitors Are Not at Increased Vulnerability to Tumorigenesis Initiated by ErbB2 , 2015, PloS one.

[42]  Balaji Krishnapuram,et al.  On Ranking in Survival Analysis: Bounds on the Concordance Index , 2007, NIPS.

[43]  Jingxuan Xu,et al.  MicroRNA-509-3p inhibits cell proliferation and invasion via downregulation of X-linked inhibitor of apoptosis in glioma. , 2017, Oncology letters.

[44]  T. Mitsudomi,et al.  Mutations of the p53 gene as a predictor of poor prognosis in patients with non-small-cell lung cancer. , 1993, Journal of the National Cancer Institute.

[45]  Wei Guo,et al.  BPTF promotes hepatocellular carcinoma growth by modulating hTERT signaling and cancer stem cell traits , 2018, Redox biology.

[46]  Gill Lawrence,et al.  Evidence against the proposition that “UK cancer survival statistics are misleading”: simulation study with National Cancer Registry data , 2011, BMJ : British Medical Journal.

[47]  Chris Sander,et al.  Time to Recurrence and Survival in Serous Ovarian Tumors Predicted from Integrated Genomic Profiles , 2011, PloS one.

[48]  Kumardeep Chaudhary,et al.  Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer , 2017, Clinical Cancer Research.

[49]  Jorng-Tzong Horng,et al.  Effects of Statins on Incident Dementia in Patients with Type 2 DM: A Population-Based Retrospective Cohort Study in Taiwan , 2014, PloS one.

[50]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[51]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[52]  Akira Mogi,et al.  TP53 Mutations in Nonsmall Cell Lung Cancer , 2011, Journal of biomedicine & biotechnology.

[53]  G. Getz,et al.  Inferring tumour purity and stromal and immune cell admixture from expression data , 2013, Nature Communications.

[54]  Casey S. Greene,et al.  Unsupervised Feature Construction and Knowledge Extraction from Genome-Wide Assays of Breast Cancer with Denoising Autoencoders , 2014, Pacific Symposium on Biocomputing.

[55]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[56]  C. Plass,et al.  Pan-cancer patterns of DNA methylation , 2014, Genome Medicine.

[57]  Zhen Jiang,et al.  Long noncoding RNA FER1L4 suppresses cancer cell growth by acting as a competing endogenous RNA and regulating PTEN expression , 2015, Scientific Reports.

[58]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[59]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[60]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[61]  C. Sander,et al.  Analysis of microRNA-target interactions across diverse cancer types , 2013, Nature Structural &Molecular Biology.

[62]  Harri Mustonen,et al.  REG4 Independently Predicts Better Prognosis in Non-Mucinous Colorectal Cancer , 2014, PloS one.

[63]  Jorng-Tzong Horng,et al.  Characterization and prediction of mRNA polyadenylation sites in human genes , 2011, Medical & Biological Engineering & Computing.

[64]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[65]  Kyung-Ah Lee,et al.  Polo-Like Kinases (Plks), a Key Regulator of Cell Cycle and New Potential Target for Cancer Therapy , 2014, Development & reproduction.