An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform

The proper identification of differentially methylated CpGs is central in most epigenetic studies. The Illumina HumanMethylation450 BeadChip is widely used to quantify DNA methylation; nevertheless, the design of an appropriate analysis pipeline faces severe challenges due to the convolution of biological and technical variability and the presence of a signal bias between Infinium I and II probe design types. Despite recent attempts to investigate how to analyze DNA methylation data with such an array design, it has not been possible to perform a comprehensive comparison between different bioinformatics pipelines due to the lack of appropriate data sets having both large sample size and sufficient number of technical replicates. Here we perform such a comparative analysis, targeting the problems of reducing the technical variability, eliminating the probe design bias and reducing the batch effect by exploiting two unpublished data sets, which included technical replicates and were profiled for DNA methylation either on peripheral blood, monocytes or muscle biopsies. We evaluated the performance of different analysis pipelines and demonstrated that: (1) it is critical to correct for the probe design type, since the amplitude of the measured methylation change depends on the underlying chemistry; (2) the effect of different normalization schemes is mixed, and the most effective method in our hands were quantile normalization and Beta Mixture Quantile dilation (BMIQ); (3) it is beneficial to correct for batch effects. In conclusion, our comparative analysis using a comprehensive data set suggests an efficient pipeline for proper identification of differentially methylated CpGs using the Illumina 450K arrays.

[1]  G. Coetzee,et al.  5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. , 1990, Science.

[2]  M. Tan,et al.  Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. , 1996, Biometrics.

[3]  S. Hui,et al.  Evaluation of diagnostic tests without gold standards , 1998, Statistical methods in medical research.

[4]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[5]  T. Mikkelsen,et al.  Genome-scale DNA methylation maps of pluripotent and differentiated cells , 2008, Nature.

[6]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[7]  K. Gunderson,et al.  Genome-wide DNA methylation profiling using Infinium® assay. , 2009, Epigenomics.

[8]  Christian Schmidl,et al.  Lineage-specific DNA methylation in T cells correlates with histone methylation and enhancer activity. , 2009, Genome research.

[9]  Robert S Illingworth,et al.  CpG islands – ‘A rough guide’ , 2009, FEBS letters.

[10]  Dirk Schübeler,et al.  Methylated DNA immunoprecipitation (MeDIP). , 2009, Methods in molecular biology.

[11]  Martin J Aryee,et al.  Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts , 2009, Nature Genetics.

[12]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[13]  A. Feinberg,et al.  Genome-wide methylation analysis of human colon cancer reveals similar hypo- and hypermethylation at conserved tissue-specific CpG island shores , 2008, Nature Genetics.

[14]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[15]  Joshua F. McMichael,et al.  DNMT3A mutations in acute myeloid leukemia. , 2010, The New England journal of medicine.

[16]  A. Feinberg,et al.  Comprehensive High‐Throughput Arrays for Relative Methylation (CHARM) , 2010, Current protocols in human genetics.

[17]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[18]  David Serre,et al.  MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome , 2009, Nucleic acids research.

[19]  Arturas Petronis,et al.  Epigenetics as a unifying principle in the aetiology of complex traits and diseases , 2010, Nature.

[20]  Andrew P. Feinberg,et al.  Epigenomics Reveals a Functional Genome Anatomy and a New Approach to Common Disease , 2010, Nature Biotechnology.

[21]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[22]  C. Sotiriou,et al.  Evaluation of the Infinium Methylation 450K technology. , 2011, Epigenomics.

[23]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[24]  D. Balding,et al.  Epigenome-wide association studies for common human diseases , 2011, Nature Reviews Genetics.

[25]  Gavin D. Meredith,et al.  High Resolution Detection and Analysis of CpG Dinucleotides Methylation Using MBD-Seq Technology , 2011, PloS one.

[26]  M. Esteller,et al.  Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome , 2011, Epigenetics.

[27]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[28]  K. V. Donkena,et al.  Batch effect correction for genome-wide methylation data with Illumina Infinium platform , 2011, BMC Medical Genomics.

[29]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[30]  K. Heichman,et al.  DNA methylation biomarkers and their utility for solid cancer diagnostics , 2012, Clinical chemistry and laboratory medicine.

[31]  M. Dawson,et al.  Cancer Epigenetics: From Mechanism to Therapy , 2012, Cell.

[32]  J. Uhm IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype , 2012 .

[33]  M. Esteller,et al.  DNA methylation profiling in the clinic: applications and challenges , 2012, Nature Reviews Genetics.

[34]  J. Tost,et al.  Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. , 2012, Epigenomics.

[35]  J. Kere,et al.  Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility , 2012, PloS one.

[36]  Peter A. Jones Functions of DNA methylation: islands, start sites, gene bodies and beyond , 2012, Nature Reviews Genetics.

[37]  Jeffrey T Leek,et al.  Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. , 2012, International journal of epidemiology.

[38]  A. Oshlack,et al.  SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips , 2012, Genome Biology.

[39]  Michael Weber,et al.  Methylated DNA immunoprecipitation (MeDIP) from low amounts of cells. , 2012, Methods in molecular biology.

[40]  Francesco Marabita,et al.  A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data , 2012, Bioinform..

[41]  K. Beath Random Effects Latent Class Analysis , 2015 .