Across‐Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression

DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high‐throughput technologies have enabled genome‐wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post‐imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait‐associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome‐wide association study (EWAS).

[1]  Michael Q. Zhang,et al.  Computational prediction of methylation status in human genomic sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[3]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[4]  S. Horvath DNA methylation age of human tissues and cell types , 2013, Genome Biology.

[5]  M. Boehnke,et al.  Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models , 2015, Genetics.

[6]  Yun Li,et al.  Imputation of coding variants in African Americans: better performance using data from the exome sequencing project , 2013, Bioinform..

[7]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[8]  Timothy E. Reddy,et al.  Dynamic DNA methylation across diverse human cell lines and tissues , 2013, Genome research.

[9]  Manolis Kellis,et al.  Large-scale epigenome imputation improves data quality and disease variant enrichment , 2015, Nature Biotechnology.

[10]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[11]  P. Laird,et al.  Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina–associated domains , 2011, Nature Genetics.

[12]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[13]  Wei Chen,et al.  Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models , 2015, Genetics.

[14]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[15]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[16]  Richard T. Barfield,et al.  Accounting for Population Stratification in DNA Methylation Studies , 2014, Genetic epidemiology.

[17]  Chia-Lin Wei,et al.  Dynamic changes in the human methylome during differentiation. , 2010, Genome research.

[18]  The Cancer Genome Atlas Research Network COMPREHENSIVE MOLECULAR CHARACTERIZATION OF CLEAR CELL RENAL CELL CARCINOMA , 2013, Nature.

[19]  Wei Wang,et al.  Genotype Imputation of MetabochipSNPs Using a Study‐Specific Reference Panel of ∼4,000 Haplotypes in African Americans From the Women's Health Initiative , 2012, Genetic epidemiology.

[20]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013, Nature.

[21]  T. Spector,et al.  Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements , 2013, Genome Biology.

[22]  Jennifer R Harris,et al.  Extensive variation and low heritability of DNA methylation identified in a twin study. , 2011, Genome research.

[23]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[24]  Carlo Sidore,et al.  Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs , 2014, European Journal of Human Genetics.

[25]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.

[26]  Thomas Lengauer,et al.  CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure , 2006, PLoS genetics.

[27]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[28]  S. Gonzalo,et al.  Epigenetic alterations in aging. , 2010, Journal of applied physiology.

[29]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[30]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[31]  Howard Cedar,et al.  DNA methylation dynamics in health and disease , 2013, Nature Structural &Molecular Biology.

[32]  A. Gnirke,et al.  Charting a dynamic DNA methylation landscape of the human genome , 2013, Nature.

[33]  M. Esteller,et al.  DNA methylation profiling in the clinic: applications and challenges , 2012, Nature Reviews Genetics.

[34]  Karen L. Mohlke,et al.  Association Studies with Imputed Variants Using Expectation-Maximization Likelihood-Ratio Tests , 2014, PloS one.

[35]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[36]  Yang Ning,et al.  Semiparametric Tests for Identifying Differentially Methylated Loci With Case–Control Designs Using Illumina Arrays , 2014, Genetic epidemiology.

[37]  Baoshan Ma,et al.  Predicting DNA methylation level across human tissues , 2014, Nucleic acids research.

[38]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[39]  Noah A. Rosenberg,et al.  A Coalescent Model for Genotype Imputation , 2012, Genetics.

[40]  Steven J. M. Jones,et al.  Integrated genomic characterization of endometrial carcinoma , 2013, Nature.

[41]  J. Rogers,et al.  DNA methylation profiling of human chromosomes 6, 20 and 22 , 2006, Nature Genetics.

[42]  Hongwei Wu,et al.  CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome , 2013, BMC Medical Genomics.

[43]  Ciprian M Crainiceanu,et al.  Penalized Functional Regression , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[44]  Manoj Bhasin,et al.  Prediction of methylated CpGs in DNA sequences using a support vector machine , 2005, FEBS letters.

[45]  D. Balding,et al.  Epigenome-wide association studies for common human diseases , 2011, Nature Reviews Genetics.

[46]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.