MethRaFo: MeDIP‐seq methylation estimate using a Random Forest Regressor

Motivation Profiling of genome wide DNA methylation is now routinely performed when studying development, cancer and several other biological processes. Although Whole genome Bisulfite Sequencing provides high‐quality methylation measurements at the resolution of nucleotides, it is relatively costly and so several studies have used alternative methods for such profiling. One of the most widely used low cost alternatives is MeDIP‐Seq. However, MeDIP‐Seq is biased for CpG enriched regions and thus its results need to be corrected in order to determine accurate methylation levels. Results Here we present a method for correcting MeDIP‐Seq results based on Random Forest regression. Applying the method to real data from several different tissues (brain, cortex, penis) we show that it achieves almost 4 fold decrease in run time while increasing accuracy by as much as 20% over prior methods developed for this task. Availability and implementation MethRaFo is freely available as a python package (with a R wrapper) at https://github.com/phoenixding/methrafo. Contact zivbj@cs.cmu.edu Supplementary information Supplementary data are available at Bioinformatics online.

[1]  M. Ehrlich,et al.  DNA methylation in cancer: too much, but also too little , 2002, Oncogene.

[2]  Howard Slomko,et al.  Minireview: Epigenetics of obesity and diabetes in humans. , 2012, Endocrinology.

[3]  A. Franke,et al.  DNA methylome analysis using short bisulfite sequencing data , 2012, Nature Methods.

[4]  Y. Shin,et al.  Efficiency of methylated DNA immunoprecipitation bisulphite sequencing for whole-genome DNA methylation analysis. , 2016, Epigenomics.

[5]  B. Zhang,et al.  Combining MeDIP-seq and MRE-seq to investigate genome-wide CpG methylation. , 2015, Methods.

[6]  Ralf Herwig,et al.  MEDIPS: genome-wide differential coverage analysis of sequencing data derived from DNA enrichment experiments , 2013, Bioinform..

[7]  B. Richardson Impact of aging on DNA methylation , 2003, Ageing Research Reviews.

[8]  Pao-Yang Chen,et al.  Profiling genome-wide DNA methylation , 2016, Epigenetics & Chromatin.

[9]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[10]  S. Clark,et al.  High sensitivity mapping of methylated cytosines. , 1994, Nucleic acids research.

[11]  Jeffrey B. Cheng,et al.  Estimating absolute methylation levels at single-CpG resolution from methylation enrichment and restriction enzyme sequencing methods , 2013, RECOMB.

[12]  Andrea Riebler,et al.  BayMeth: improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach , 2013, Genome Biology.

[13]  S. Baylin,et al.  Switch from monoallelic to biallelic human IGF2 promoter methylation during aging and carcinogenesis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Durbin,et al.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis , 2008, Nature Biotechnology.