seqlm: an MDL based method for identifying differentially methylated regions in high density methylation array data

Motivation: One of the main goals of large scale methylation studies is to detect differentially methylated loci. One way is to approach this problem sitewise, i.e. to find differentially methylated positions (DMPs). However, it has been shown that methylation is regulated in longer genomic regions. So it is more desirable to identify differentially methylated regions (DMRs) instead of DMPs. The new high coverage arrays, like Illuminas 450k platform, make it possible at a reasonable cost. Few tools exist for DMR identification from this type of data, but there is no standard approach. Results: We propose a novel method for DMR identification that detects the region boundaries according to the minimum description length (MDL) principle, essentially solving the problem of model selection. The significance of the regions is established using linear mixed models. Using both simulated and large publicly available methylation datasets, we compare seqlm performance to alternative approaches. We demonstrate that it is both more sensitive and specific than competing methods. This is achieved with minimal parameter tuning and, surprisingly, quickest running time of all the tried methods. Finally, we show that the regional differential methylation patterns identified on sparse array data are confirmed by higher resolution sequencing approaches. Availability and Implementation: The methods have been implemented in R package seqlm that is available through Github: https://github.com/raivokolde/seqlm Contact: rkolde@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  F. Lienert,et al.  Identification of genetic elements that autonomously determine DNA methylation states , 2011, Nature Genetics.

[2]  Dan Wang,et al.  IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data , 2012, Bioinform..

[3]  Eric-Wubbo Lameijer,et al.  Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array , 2013, Epigenetics & Chromatin.

[4]  J. Tegnér,et al.  An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform , 2013, Epigenetics.

[5]  Raivo Kolde,et al.  DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns , 2014, Genome Biology.

[6]  Heikki Mannila,et al.  An MDL Method for Finding Haplotype Blocks and for Estimating the Strength of Haplotype Block Boundaries , 2002, Pacific Symposium on Biocomputing.

[7]  Peter A. Jones,et al.  A decade of exploring the cancer epigenome — biological and translational implications , 2011, Nature Reviews Cancer.

[8]  Gregory Shakhnarovich,et al.  Discovery of phosphorylation motif mixtures in phosphoproteomics data , 2008, Bioinform..

[9]  Brian Biehs,et al.  Statistical Applications in Genetics and Molecular Biology Generalizing Moving Averages for Tiling Arrays Using Combined P-Value Statistics , 2011 .

[10]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[11]  Frank Wessely,et al.  Identification of DNA methylation biomarkers from Infinium arrays , 2012, Front. Gene..

[12]  Brent S. Pedersen,et al.  Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values , 2012, Bioinform..

[13]  J. Rogers,et al.  DNA methylation profiling of human chromosomes 6, 20 and 22 , 2006, Nature Genetics.

[14]  Tamar Sofer,et al.  A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure , 2013, Bioinform..

[15]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[16]  Jeffrey T Leek,et al.  Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. , 2012, International journal of epidemiology.

[17]  Long-Cheng Li,et al.  MethPrimer: designing primers for methylation PCRs , 2002, Bioinform..

[18]  Dvir Aran,et al.  Genome-wide survey reveals predisposing diabetes type 2-related DNA methylation variations in human peripheral blood. , 2012, Human molecular genetics.

[19]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[20]  Robin M. Murray,et al.  Epigenome-Wide Scans Identify Differentially Methylated Regions for Age and Age-Related Phenotypes in a Healthy Ageing Population , 2012, PLoS genetics.

[21]  Sun-Chong Wang,et al.  Epigenomic profiling reveals DNA-methylation changes associated with major psychosis. , 2008, American journal of human genetics.