MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction

Genomic biomarkers such as DNA methylation (DNAm) are employed for age prediction. In recent years, several studies have suggested the association between changes in DNAm and its effect on human age. The high dimensional nature of this type of data significantly increases the execution time of modeling algorithms. To mitigate this problem, we propose a two-stage parallel algorithm for selection of age related CpG-sites. The algorithm first attempts to cluster the data into similar age ranges. In the next stage, a parallel genetic algorithm (PGA), based on the MapReduce paradigm (MR-based PGA), is used for selecting age-related features of each individual age range. In the proposed method, the execution of the algorithm for each age range (data parallel), the evaluation of chromosomes (task parallel) and the calculation of the fitness function (data parallel) are performed using a novel parallel framework. In this paper, we consider 16 different healthy DNAm datasets that are related to the human blood tissue and that contain the relevant age information. These datasets are combined into a single unioned set, which is in turn randomly divided into two sets of train and test data with a ratio of 7:3, respectively. We build a Gradient Boosting Regressor (GBR) model on the selected CpG-sites from the train set. To evaluate the model accuracy, we compared our results with state-of-the-art approaches that used these datasets, and observed that our method performs better on the unseen test dataset with a Mean Absolute Deviation (MAD) of 3.62 years, and a correlation (R2) of 95.96% between age and DNAm. In the train data, the MAD and R2 are 1.27 years and 99.27%, respectively. Finally, we evaluate our method in terms of the effect of parallelization in computation time. The algorithm without parallelization requires 4123 min to complete, whereas the parallelized execution on 3 computing machines having 32 processing cores each, only takes a total of 58 min. This shows that our proposed algorithm is both efficient and scalable.

[1]  P. Visscher,et al.  Neonatal DNA methylation profile in human twins is specified by a complex interplay between intrauterine environmental and genetic factors, subject to tissue-specific influence , 2012, Genome research.

[2]  Prashanth Suravajhala,et al.  Gene selection for tumor classification using a novel bio-inspired multi-objective approach. , 2018, Genomics.

[3]  Seokhee Jeon,et al.  MapReduce based parallel gene selection method , 2014, Applied Intelligence.

[4]  P. Munkholm,et al.  Genome‐wide peripheral blood leukocyte DNA methylation microarrays identified a single association with inflammatory bowel diseases , 2012, Inflammatory bowel diseases.

[5]  R. Decorte,et al.  Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation. , 2018, Forensic science international. Genetics.

[6]  R. Decorte,et al.  Improved age determination of blood and teeth samples using a selected set of DNA methylation markers , 2015, Epigenetics.

[7]  Wilfred F J van IJcken,et al.  Human age estimation from blood using mRNA, DNA methylation, DNA rearrangement, and telomere length. , 2016, Forensic science international. Genetics.

[8]  Jiayi Hou,et al.  Developing a DNA methylation assay for human age prediction in blood and bloodstain. , 2015, Forensic science international. Genetics.

[9]  José García-Nieto,et al.  Parallel multi-swarm optimizer for gene selection in DNA microarrays , 2011, Applied Intelligence.

[10]  René S. Kahn,et al.  The Relationship of DNA Methylation with Age, Gender and Genotype in Twins and Healthy Controls , 2009, PloS one.

[11]  Amarendra S. Yavatkar,et al.  StemCellDB: the human pluripotent stem cell database at the National Institutes of Health. , 2013, Stem cell research.

[12]  Xiao Chen,et al.  A multi-objective heuristic algorithm for gene expression microarray data classification , 2016, Expert Syst. Appl..

[13]  Owen T McCann,et al.  Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. , 2010, Genome research.

[14]  Lan Hu,et al.  A novel strategy for forensic age prediction by DNA methylation and support vector regression model , 2015, Scientific Reports.

[15]  Daixin Huang,et al.  Age-related DNA methylation changes for forensic age-prediction , 2015, International Journal of Legal Medicine.

[16]  T. Ideker,et al.  Genome-wide methylation profiles reveal quantitative views of human aging rates. , 2013, Molecular cell.

[17]  Mohamed Limam,et al.  Ensemble feature selection for high dimensional data: a new method and a comparative study , 2017, Advances in Data Analysis and Classification.

[18]  G. Satten,et al.  Age-associated DNA methylation in pediatric populations. , 2012, Genome research.

[19]  Gusztav Belteki,et al.  Periconceptional maternal micronutrient supplementation is associated with widespread gender related changes in the epigenome: a study of a unique resource in the Gambia. , 2012, Human molecular genetics.

[20]  Sae Rom Hong,et al.  DNA methylation-based age prediction from saliva: High age predictability by combination of 7 CpG markers. , 2017, Forensic science international. Genetics.

[21]  Yan Xu,et al.  Human age prediction based on DNA methylation of non-blood tissues , 2019, Comput. Methods Programs Biomed..

[22]  Eldon Emberly,et al.  Factors underlying variable DNA methylation in a human community cohort , 2012, Proceedings of the National Academy of Sciences.

[23]  Robin M. Murray,et al.  Epigenome-Wide Scans Identify Differentially Methylated Regions for Age and Age-Related Phenotypes in a Healthy Ageing Population , 2012, PLoS genetics.

[24]  Julia Krushkal,et al.  Parental ages and levels of DNA methylation in the newborn are correlated , 2011, BMC Medical Genetics.

[25]  Yan Xu,et al.  Human Age Prediction Based on DNA Methylation Using a Gradient Boosting Regressor , 2018, Genes.

[26]  Jasmin Kevric,et al.  Cloud computing-based parallel genetic algorithm for gene selection in cancer classification , 2016, Neural Computing and Applications.

[27]  M. Hasan Shaheed,et al.  Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification , 2017, J. Biomed. Informatics.

[28]  Alfonso Valencia,et al.  Distinct DNA methylomes of newborns and centenarians , 2012, Proceedings of the National Academy of Sciences.

[29]  Chrysanthi Ainali,et al.  Differential methylation of the TRPA1 promoter in pain sensitivity , 2014, Nature Communications.

[30]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[31]  H. Hoefsloot,et al.  Chronological age prediction based on DNA methylation: Massive parallel sequencing and random forest regression. , 2017, Forensic science international. Genetics.

[32]  Wadha A. Al Muftah,et al.  Association of DNA methylation with age, gender, and smoking in an Arab population , 2015, Clinical Epigenetics.

[33]  S. Horvath,et al.  Aging effects on DNA methylation modules in human brain and blood tissue , 2012, Genome Biology.

[34]  David Ballard,et al.  DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing , 2017, Forensic science international. Genetics.

[35]  Hwan Young Lee,et al.  Platform-independent models for age prediction using DNA methylation data. , 2019, Forensic science international. Genetics.

[36]  R. Płoski,et al.  Development of a forensically useful age prediction method based on DNA methylation analysis. , 2015, Forensic science international. Genetics.

[37]  Raimund Erbel,et al.  Aging of blood can be tracked by DNA methylation changes at just three CpG sites , 2014, Genome Biology.

[38]  B. McCord,et al.  Evaluation of DNA methylation markers and their potential to predict human aging , 2015, Electrophoresis.