Detection and accurate False Discovery Rate control of differentially methylated regions from Whole Genome Bisulfite Sequencing

With recent advances in sequencing technology, it is now feasible to measure DNA methylation at tens of millions of sites across the entire genome. In most applications, biologists are interested in detecting differentially methylated regions, composed of multiple sites with differing methylation levels among populations. However, current computational approaches for detecting such regions do not provide accurate statistical inference. A major challenge in reporting uncertainty is that a genome-wide scan is involved in detecting these regions, which needs to be accounted for. A further challenge is that sample sizes are limited due to the costs associated with the technology. We have developed a new approach that overcomes these challenges and assesses uncertainty for differentially methylated regions in a rigorous manner. Region-level statistics are obtained by fitting a generalized least squares regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions. We develop an inferential approach, based on a pooled null distribution, that can be implemented even when as few as two samples per population are available. Here, we demonstrate the advantages of our method using both experimental data and Monte Carlo simulation. We find that the new method improves the specificity and sensitivity of lists of regions and accurately controls the false discovery rate.

[1]  Cristina Mitrea,et al.  A survey of the approaches for identifying differential methylation using bisulfite sequencing data , 2018, Briefings Bioinform..

[2]  D. Dickel,et al.  Spatiotemporal DNA Methylome Dynamics of the Developing Mammalian Fetus , 2017, bioRxiv.

[3]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[4]  A. Feinberg,et al.  Genome-wide methylation analysis of human colon cancer reveals similar hypo- and hypermethylation at conserved tissue-specific CpG island shores , 2008, Nature Genetics.

[5]  Mark D. Robinson,et al.  Statistical methods for detecting differentially methylated loci and regions , 2014, Front. Genet..

[6]  Helene Kretzmer,et al.  metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data , 2016, Genome research.

[7]  B. Langmead,et al.  BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions , 2012, Genome Biology.

[8]  Jeffrey T Leek,et al.  Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. , 2012, International journal of epidemiology.

[9]  Toutai Mituyama,et al.  Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions , 2014, Nucleic acids research.

[10]  Hao Wu,et al.  Differential methylation analysis for BS-seq data under general experimental design , 2016, Bioinform..

[11]  Francine E. Garrett-Bakelman,et al.  methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles , 2012, Genome Biology.

[12]  Wei Li,et al.  MOABS: model based analysis of bisulfite sequencing data , 2014, Genome Biology.

[13]  Matthew D. Schultz,et al.  Human Body Epigenome Maps Reveal Noncanonical DNA Methylation Variation , 2015, Nature.

[14]  Andrew D. Smith,et al.  Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments , 2014, BMC Bioinformatics.

[15]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[16]  Martin J. Aryee,et al.  Coverage recommendations for methylation analysis by whole genome bisulfite sequencing , 2014, Nature Methods.

[17]  M. Wand Local Regression and Likelihood , 2001 .

[18]  Martin Dugas,et al.  Detection of significantly differentially methylated regions in targeted bisulfite sequencing data , 2013, Bioinform..

[19]  D. Siegmund,et al.  False discovery rate for scanning statistics , 2011 .

[20]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[21]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[22]  Shuying Sun,et al.  HMM-DM: identifying differentially methylated regions using a hidden Markov model , 2016, Statistical applications in genetics and molecular biology.

[23]  Zhaohui S. Qin,et al.  Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates , 2015, Nucleic acids research.

[24]  Guido Sanguinetti,et al.  M3D: a kernel-based test for spatially correlated changes in methylation profiles , 2014, Bioinform..

[25]  Miao Yu,et al.  Bacterial infection remodels the DNA methylation landscape of human dendritic cells , 2015, bioRxiv.

[26]  V. Marx Genetics: profiling DNA methylation and beyond , 2016, Nature Methods.

[27]  Ellen McCrady A Survey of Approaches , 1982 .

[28]  Kevin Y. Yip,et al.  Whole-genome bisulfite sequencing of multiple individuals reveals complementary roles of promoter and gene body methylation in transcriptional regulation , 2014, Genome Biology.

[29]  M. Aerts,et al.  A solution to separation for clustered binary data , 2012 .

[30]  Wonyul Lee,et al.  Identification of differentially methylated loci using wavelet-based functional mixed models , 2016, Bioinform..

[31]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[32]  R. H. Jones,et al.  Unequally spaced longitudinal data with AR(1) serial correlation. , 1991, Biometrics.

[33]  Yongseok Park,et al.  MethylSig: a whole genome DNA methylation analysis pipeline , 2014, Bioinform..

[34]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[35]  Aaron T. L. Lun,et al.  De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly , 2014, Nucleic acids research.

[36]  Wei Li,et al.  DNMT3A Loss Drives Enhancer Hypomethylation in FLT3-ITD-Associated Leukemias. , 2016, Cancer cell.

[37]  Yalu Wen,et al.  Detection of differentially methylated regions in whole genome bisulfite sequencing data using local Getis-Ord statistics , 2016, Bioinform..

[38]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.

[39]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[40]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[41]  Abdullah M. Khamis,et al.  CpG traffic lights are markers of regulatory regions in humans , 2017, bioRxiv.

[42]  Rafael A. Irizarry,et al.  Selection Corrected Statistical Inference for Region Detection with High-throughput Assays , 2016 .