Prediction of Alzheimer's disease based on deep neural network by integrating gene expression and DNA methylation dataset

Abstract Motivation The molecular mechanism of Alzheimer's disease (AD) has not been clearly revealed and there is no clinically reliable genetic risk factor. Therefore, diagnosis of AD has been mostly performed by analyzing brain images such as magnetic resonance imaging and neuropsychological tests. Identifying the molecular-level mechanism of AD has been lacking data owing to the difficulty of sampling in the posterior brains of normal and AD patients; however, recent studies have produced and analyzed large-scale omics data for brain areas such as prefrontal cortex. Therefore, it is necessary to develop AD diagnosis or prediction methods based on these data. Results This paper proposed a deep learning-based model that can predict AD using large-scale gene expression and DNA methylation data. The most challenging problem in constructing a model to diagnose AD based on the multi-omics dataset is how to integrate different omics data and how to deal with high-dimensional and low-sample-size data. To solve this problem, we proposed a novel but simple approach to reduce the number of features based on a differentially expressed gene and a differentially methylated position in the multi-omics dataset. Moreover, we developed a deep neural network-based prediction model that improves performance compared to that of conventional machine learning algorithms. The feature selection method and the prediction model presented in this paper outperformed conventional machine learning algorithms, which utilize typical dimension reduction methods. In addition, we demonstrated that integrating gene expression and DNA methylation data could improve the prediction accuracy. Availability https://github.com/ChihyunPark/DNN_for_ADprediction .

[1]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[2]  D. Blacker,et al.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database , 2007, Nature Genetics.

[3]  Kathryn Ziegler-Graham,et al.  Forecasting the global burden of Alzheimer’s disease , 2007, Alzheimer's & Dementia.

[4]  Tieliu Shi,et al.  Deep Learning-Based Multi-Omics Data Integration Reveals Two Prognostic Subtypes in High-Risk Neuroblastoma , 2018, Front. Genet..

[5]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[6]  Hao Liu,et al.  An integrated methylation and gene expression microarray analysis reveals significant prognostic biomarkers in oral squamous cell carcinoma , 2018, Oncology reports.

[7]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[8]  Huaxi Xu,et al.  Apolipoprotein E and Alzheimer disease: risk, mechanisms and therapy , 2013, Nature Reviews Neurology.

[9]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[10]  D. Bennett,et al.  Elevated DNA methylation across a 48-kb region spanning the HOXA gene cluster is associated with Alzheimer’s disease neuropathology , 2018, Alzheimer's & Dementia.

[11]  Belinda Phipson,et al.  A cross-package Bioconductor workflow for analysing methylation array data , 2016, bioRxiv.

[12]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[13]  Xia Yang,et al.  Common dysregulation network in the human prefrontal cortex underlies two neurodegenerative diseases , 2014 .

[14]  Lilah M. Besser,et al.  Genetic assessment of age-associated Alzheimer disease risk: Development and validation of a polygenic hazard score , 2017, PLoS medicine.

[15]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[16]  L. Tran,et al.  Integrated Systems Approach Identifies Genetic Nodes and Networks in Late-Onset Alzheimer’s Disease , 2013, Cell.

[17]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[18]  C. Jack,et al.  NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease , 2018, Alzheimer's & Dementia.

[19]  M. Lv,et al.  Combined bioinformatics analysis reveals gene expression and DNA methylation patterns in osteoarthritis , 2018, Molecular medicine reports.

[20]  Harald Hampel,et al.  Biological markers of amyloid β-related mechanisms in Alzheimer's disease , 2010, Experimental Neurology.

[21]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Charles C. White,et al.  CpG‐related SNPs in the MS4A region have a dose‐dependent effect on risk of late–onset Alzheimer disease , 2019, Aging cell.

[23]  Kumardeep Chaudhary,et al.  Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer , 2017, Clinical Cancer Research.

[24]  Konrad J. Karczewski,et al.  Integrative omics for health and disease , 2018, Nature Reviews Genetics.

[25]  D. Na,et al.  Machine Learning-based Individual Assessment of Cortical Atrophy Pattern in Alzheimer’s Disease Spectrum: Development of the Classifier and Longitudinal Evaluation , 2018, Scientific Reports.

[26]  Stephen C. J. Parker,et al.  Integrative analysis of gene expression, DNA methylation, physiological traits, and genetic variation in human skeletal muscle , 2019, Proceedings of the National Academy of Sciences.

[27]  Shannon L. Risacher,et al.  Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data , 2017, Briefings Bioinform..

[28]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[29]  Seok Jong Yu,et al.  Systematic identification of differential gene network to elucidate Alzheimer's disease , 2017, Expert Syst. Appl..

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Sanghyun Park,et al.  Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers , 2017, Bioinform..

[32]  Carlos Fernandez-Lozano,et al.  Classification of mild cognitive impairment and Alzheimer's Disease with machine-learning techniques using 1H Magnetic Resonance Spectroscopy data , 2015, Expert Syst. Appl..