A data-driven approach to preprocessing Illumina 450K methylation array data

BackgroundAs the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets.ResultsThe standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive.ConclusionsCareful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.

[1]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[2]  H. Carén,et al.  RESEARCH ARTICLE Open Access Identification of epigenetically regulated genes that predict patient outcome in neuroblastoma , 2022 .

[3]  C. Sotiriou,et al.  Evaluation of the Infinium Methylation 450K technology. , 2011, Epigenomics.

[4]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[5]  J. Mill,et al.  Commentary: The seven plagues of epigenetic epidemiology , 2012, International journal of epidemiology.

[6]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[7]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[8]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[9]  R. Murray,et al.  Disease-associated epigenetic changes in monozygotic twins discordant for schizophrenia and bipolar disorder , 2011, Human molecular genetics.

[10]  Yi-an Chen,et al.  Sequence overlap between autosomal and sex-linked probes on the Illumina HumanMethylation27 microarray. , 2011, Genomics.

[11]  A. Oshlack,et al.  SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips , 2012, Genome Biology.

[12]  T. Bestor,et al.  WAMIDEX: A web atlas of murine genomic imprinting and differential expression , 2008, Epigenetics.

[13]  I. Imoto,et al.  DNA Methylation Signatures of Peripheral Leukocytes in Schizophrenia , 2012, NeuroMolecular Medicine.

[14]  Thomas A. Down,et al.  Identification of Type 1 Diabetes–Associated DNA Methylation Variable Positions That Precede Disease Diagnosis , 2010, PLoS genetics.

[15]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[16]  J. Tost,et al.  Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. , 2012, Epigenomics.

[17]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[18]  K. V. Donkena,et al.  Batch effect correction for genome-wide methylation data with Illumina Infinium platform , 2011, BMC Medical Genomics.

[19]  M. Esteller,et al.  Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome , 2011, Epigenetics.

[20]  Dan Wang,et al.  IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data , 2012, Bioinform..

[21]  O. Ammerpohl,et al.  Quantitative cross-validation and content analysis of the 450k DNA methylation array from Illumina, Inc. , 2012, BMC Research Notes.

[22]  Francesco Marabita,et al.  A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data , 2012, Bioinform..

[23]  K. Gunderson,et al.  Genome-wide DNA methylation profiling using Infinium® assay. , 2009, Epigenomics.