A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data

Motivation: The Illumina Infinium 450 k DNA Methylation Beadchip is a prime candidate technology for Epigenome-Wide Association Studies (EWAS). However, a difficulty associated with these beadarrays is that probes come in two different designs, characterized by widely different DNA methylation distributions and dynamic range, which may bias downstream analyses. A key statistical issue is therefore how best to adjust for the two different probe designs. Results: Here we propose a novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes. The strategy involves application of a three-state beta-mixture model to assign probes to methylation states, subsequent transformation of probabilities into quantiles and finally a methylation-dependent dilation transformation to preserve the monotonicity and continuity of the data. We validate our method on cell-line data, fresh frozen and paraffin-embedded tumour tissue samples and demonstrate that BMIQ compares favourably with two competing methods. Specifically, we show that BMIQ improves the robustness of the normalization procedure, reduces the technical variation and bias of type2 probe values and successfully eliminates the type1 enrichment bias caused by the lower dynamic range of type2 probes. BMIQ will be useful as a preprocessing step for any study using the Illumina Infinium 450 k platform. Availability: BMIQ is freely available from http://code.google.com/p/bmiq/. Contact: a.teschendorff@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online

[1]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[2]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[3]  M. Frommer,et al.  CpG islands in vertebrate genomes. , 1987, Journal of molecular biology.

[4]  S. Baylin,et al.  Epigenetic gene silencing in cancer – a mechanism for early oncogenic pathway addiction? , 2006, Nature Reviews Cancer.

[5]  J. Rogers,et al.  DNA methylation profiling of human chromosomes 6, 20 and 22 , 2006, Nature Genetics.

[6]  C. Sotiriou,et al.  Evaluation of the Infinium Methylation 450K technology. , 2011, Epigenomics.

[7]  Devin C. Koestler,et al.  Semi-supervised recursively partitioned mixture models for identifying cancer subtypes , 2010, Bioinform..

[8]  Brian J. Stevenson,et al.  Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line , 2009, Proceedings of the National Academy of Sciences.

[9]  Daiya Takai,et al.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[11]  A. Feinberg,et al.  The epigenetic progenitor origin of human cancer , 2006, Nature Reviews Genetics.

[12]  A. Oshlack,et al.  SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips , 2012, Genome Biology.

[13]  Dan Wang,et al.  IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data , 2012, Bioinform..

[14]  Yuan Ji,et al.  Applications of beta-mixture models in bioinformatics , 2005, Bioinform..

[15]  J. Uhm IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype , 2012 .

[16]  Peter A. Jones,et al.  The Epigenomics of Cancer , 2007, Cell.

[17]  Andrew P. Feinberg,et al.  Epigenomics Reveals a Functional Genome Anatomy and a New Approach to Common Disease , 2010, Nature Biotechnology.

[18]  Andrew E. Teschendorff,et al.  A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform , 2012, BMC Bioinformatics.

[19]  Harris A. Jaffee,et al.  Redefining CpG islands using hidden Markov models. , 2010, Biostatistics.

[20]  K. Gunderson,et al.  Genome-wide DNA methylation profiling using Infinium® assay. , 2009, Epigenomics.

[21]  J. Tost,et al.  Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. , 2012, Epigenomics.

[22]  M. Esteller,et al.  Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome , 2011, Epigenetics.

[23]  D. Balding,et al.  Epigenome-wide association studies for common human diseases , 2011, Nature Reviews Genetics.

[24]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[25]  Jeffrey T Leek,et al.  Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. , 2012, International journal of epidemiology.

[26]  Arturas Petronis,et al.  Epigenetics as a unifying principle in the aetiology of complex traits and diseases , 2010, Nature.