A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data

DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.

[1]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[2]  R. Wilson,et al.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. , 2010, Cancer cell.

[3]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[4]  Wei Chen,et al.  Comparing the DNA Hypermethylome with Gene Mutations in Human Colorectal Cancer , 2007, PLoS genetics.

[5]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[6]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  P. Laird,et al.  Comprehensive DNA Methylation Analysis on the Illumina ® Infinium ® Assay Platform , 2008 .

[9]  张静,et al.  Banana Ovate family protein MaOFP1 and MADS-box protein MuMADS1 antagonistically regulated banana fruit ripening , 2015 .

[10]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[11]  J. Tegnér,et al.  Identification of novel markers in rheumatoid arthritis through integrated analysis of DNA methylation and microRNA expression. , 2013, Journal of autoimmunity.

[12]  S. Lam,et al.  Genome-scale analysis of DNA methylation in lung adenocarcinoma and integration with mRNA expression , 2012, Genome research.

[13]  Peter A. Jones,et al.  Immune regulation by low doses of the DNA methyltransferase inhibitor 5-azacitidine in common human epithelial cancers , 2014, Oncotarget.

[14]  Jinsong Leng,et al.  A genetic Algorithm-Based feature selection , 2014 .

[15]  David Zhang,et al.  Feature selection and analysis on correlated gas sensor data with recursive feature elimination , 2015 .

[16]  E. Feierstein,et al.  DNA Methylation of the First Exon Is Tightly Linked to Transcriptional Silencing , 2011, PloS one.

[17]  Byoung-Tak Zhang,et al.  Integrated analysis of genome-wide DNA methylation and gene expression profiles in molecular subtypes of breast cancer , 2013, Nucleic acids research.

[18]  T. Mikkelsen,et al.  Genome-scale DNA methylation maps of pluripotent and differentiated cells , 2008, Nature.

[19]  Peter A. Jones Functions of DNA methylation: islands, start sites, gene bodies and beyond , 2012, Nature Reviews Genetics.

[20]  M. Kobor,et al.  Concordant and discordant DNA methylation signatures of aging in human blood and brain , 2015, Epigenetics & Chromatin.

[21]  Peter A. Jones,et al.  The Epigenomics of Cancer , 2007, Cell.

[22]  Serdar Bozdag,et al.  A Canonical Correlation Analysis-Based Dynamic Bayesian Network Prior to Infer Gene Regulatory Networks from Multiple Types of Biological Data , 2015, J. Comput. Biol..

[23]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[24]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[25]  Timothy E. Reddy,et al.  Dynamic DNA methylation across diverse human cell lines and tissues , 2013, Genome research.

[26]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[27]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[28]  Lana X. Garmire,et al.  Using epigenomics data to predict gene expression in lung cancer , 2015, BMC Bioinformatics.

[29]  O. Maeda,et al.  Alteration of gene expression and DNA methylation in drug-resistant gastric cancer. , 2014, Oncology reports.

[30]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.