Exploring DNA Methylation Data of Lung Cancer Samples with Variational Autoencoders

Lung cancer causes over one million deaths each year worldwide. DNA methylation is a well-defined epigenetics factor in genome data analyses for model training. In this article, we explore the applications of unsupervised deep learning method, variational autoencoders, using DNA methylation data of lung cancer samples downloaded from the GDC TCGA project and perform further work with latent features. We show the logistic regression classifier on the encoded latent features accurately classifies cancer subtypes.

[1]  Gregory P. Way,et al.  Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders , 2017, bioRxiv.

[2]  Jiajie Peng,et al.  Measuring phenotype-phenotype similarity through the interactome , 2017, BIBM.

[3]  Yadong Wang,et al.  A novel method to measure the semantic similarity of HPO terms , 2017, Int. J. Data Min. Bioinform..

[4]  Qinghua Guo,et al.  LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse , 2018, Nucleic Acids Res..

[5]  Yadong Wang,et al.  Identifying term relations cross different gene ontology categories , 2017, BMC Bioinformatics.

[6]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10]  François Chollet,et al.  Keras: The Python Deep Learning library , 2018 .

[11]  Shuhui Liu,et al.  Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach , 2018, BMC Systems Biology.

[12]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[13]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[14]  Brock C. Christensen,et al.  A New Dimension of Breast Cancer Epigenetics - Applications of Variational Autoencoders with DNA Methylation , 2018, BIOINFORMATICS.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Jie Sun,et al.  DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function , 2018, Bioinform..

[17]  Steven J. M. Jones,et al.  Comprehensive molecular profiling of lung adenocarcinoma , 2014, Nature.

[18]  Meng Zhou,et al.  MetSigDis: a manually curated resource for the metabolic signatures of diseases , 2019, Briefings Bioinform..

[19]  A. McCullough Comprehensive genomic characterization of squamous cell lung cancers , 2013 .