Statistical approaches for the analysis of DNA methylation microarray data

Following the rapid development and adoption in DNA methylation microarray assays, we are now experiencing a growth in the number of statistical tools to analyze the resulting large-scale data sets. As is the case for other microarray applications, biases caused by technical issues are of concern. Some of these issues are old (e.g., two-color dye bias and probe- and array-specific effects), while others are new (e.g., fragment length bias and bisulfite conversion efficiency). Here, I highlight characteristics of DNA methylation that suggest standard statistical tools developed for other data types may not be directly suitable. I then describe the microarray technologies most commonly in use, along with the methods used for preprocessing and obtaining a summary measure. I finish with a section describing downstream analyses of the data, focusing on methods that model percentage DNA methylation as the outcome, and methods for integrating DNA methylation with gene expression or genotype data.

[1]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[2]  Benjamin Tycko,et al.  Allele-specific DNA methylation: beyond imprinting. , 2010, Human molecular genetics.

[3]  Clifford A. Meyer,et al.  Model-based analysis of tiling-arrays for ChIP-chip , 2006, Proceedings of the National Academy of Sciences.

[4]  Simon Tavaré,et al.  Statistical issues in the analysis of Illumina data , 2008, BMC Bioinformatics.

[5]  Peter A. Jones,et al.  Moving AHEAD with an international human epigenome project , 2008, Nature.

[6]  P. Laird Early detection: The power and the promise of DNA methylation markers , 2003, Nature Reviews Cancer.

[7]  Simon Tavaré,et al.  beadarray: R classes and methods for Illumina bead-based data , 2007, Bioinform..

[8]  Peter A. Jones,et al.  Epigenetic Modifications as Therapeutic Targets , 2010, Nature Biotechnology.

[9]  Wei Li,et al.  Model-based analysis of two-color arrays (MA2C) , 2007, Genome Biology.

[10]  Rafael A Irizarry,et al.  Comprehensive high-throughput arrays for relative methylation (CHARM). , 2008, Genome research.

[11]  Dario Strbenac,et al.  Repitools: an R package for the analysis of enrichment-based epigenomic data , 2010, Bioinform..

[12]  Tim Hui-Ming Huang,et al.  A robust unified approach to analyzing methylation and gene expression data , 2009, Comput. Stat. Data Anal..

[13]  Mark J. van der Laan,et al.  A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap , 2003 .

[14]  A. Teschendorff,et al.  An Epigenetic Signature in Peripheral Blood Predicts Active Ovarian Cancer , 2009, PloS one.

[15]  A G Lynch,et al.  Considerations for the processing and analysis of GoldenGate-based two-colour Illumina platforms , 2009, Statistical methods in medical research.

[16]  Juan Lin,et al.  Multi-level mixed effects models for bead arrays , 2011, Bioinform..

[17]  Colin Campbell,et al.  Bayesian Unsupervised Learning with Multiple Data Types , 2009, Statistical applications in genetics and molecular biology.

[18]  W. Lam,et al.  Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells , 2005, Nature Genetics.

[19]  Dario Strbenac,et al.  Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation. , 2010, Genome research.

[20]  Wei Shi,et al.  Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips , 2010, Nucleic acids research.

[21]  T. Curran,et al.  Methylation matters: modeling a manageable genome. , 2002, Cell growth & differentiation : the molecular biology journal of the American Association for Cancer Research.

[22]  Christopher A. Miller,et al.  Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing , 2010, BMC Bioinformatics.

[23]  Mark J. van der Laan,et al.  Hybrid Clustering of Gene Expression Data with Visualization and the Bootstrap , 2001 .

[24]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[25]  D. Balding,et al.  Handbook of statistical genetics , 2004 .

[26]  T. Strachan,et al.  Human Molecular Genetics 2 , 1997 .

[27]  David Tritchler,et al.  Genome-wide sparse canonical correlation of gene expression with genotypes , 2007, BMC proceedings.

[28]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[29]  Natalie Jäger,et al.  Genome-wide mapping of DNA methylation: a quantitative technology comparison , 2010, Nature Biotechnology.

[30]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[31]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[32]  Dustin P. Potter,et al.  Probe signal correction for differential methylation hybridization experiments , 2008, BMC Bioinformatics.

[33]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[34]  Peter A. Jones,et al.  The Epigenomics of Cancer , 2007, Cell.

[35]  Irving L. Weissman,et al.  A comprehensive methylome map of lineage commitment from hematopoietic progenitors , 2010, Nature.

[36]  Peter A. Jones,et al.  Unique DNA methylation patterns distinguish noninvasive and invasive urothelial cancers and establish an epigenetic field defect in premalignant tissue. , 2010, Cancer research.

[37]  S. Ferrari,et al.  Beta Regression for Modelling Rates and Proportions , 2004 .

[38]  T. Rauch,et al.  MIRA-assisted microarray analysis, a new technology for the determination of DNA methylation patterns, identifies frequent methylation of homeodomain-containing genes in lung cancer cells. , 2006, Cancer research.

[39]  Wei Jiang,et al.  High-throughput DNA methylation profiling using universal bead arrays. , 2006, Genome research.

[40]  R. Durbin,et al.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis , 2008, Nature Biotechnology.

[41]  Quan Chen,et al.  An analytical pipeline for genomic representations used for cytosine methylation studies , 2008, Bioinform..

[42]  Ru-Fang Yeh,et al.  Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma. , 2009, Carcinogenesis.

[43]  Yang Xie,et al.  Statistical methods of background correction for Illumina BeadArray data , 2009, Bioinform..

[44]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[45]  Devin C. Koestler,et al.  Semi-supervised recursively partitioned mixture models for identifying cancer subtypes , 2010, Bioinform..

[46]  G. Smyth,et al.  Microarray background correction: maximum likelihood estimation for the normal–exponential convolution , 2008, Biostatistics.

[47]  J. Rogers,et al.  DNA methylation profiling of human chromosomes 6, 20 and 22 , 2006, Nature Genetics.

[48]  A. Urban,et al.  MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment. , 2008, Genome research.

[49]  Tim Hui-Ming Huang,et al.  An empirical Bayes model for gene expression and methylation profiles in antiestrogen resistant breast cancer , 2010, BMC Medical Genomics.

[50]  C. Fuke,et al.  Age Related Changes in 5‐methylcytosine Content in Human Peripheral Leukocytes and Placentas: an HPLC‐based Study , 2004, Annals of human genetics.

[51]  Zhijin Wu,et al.  Accurate genome-scale percentage DNA methylation estimates from microarray data. , 2011, Biostatistics.

[52]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[53]  Reid F. Thompson,et al.  High-resolution genome-wide cytosine methylation profiling with simultaneous copy number analysis and optimization for limited cell numbers , 2009, Nucleic acids research.

[54]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[55]  R. Wilson,et al.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. , 2010, Cancer cell.

[56]  Bahram Parvin,et al.  Prediction of epigenetically regulated genes in breast cancer cell lines , 2010, BMC Bioinformatics.

[57]  W. C. Chan,et al.  Identification and functional relevance of de novo DNA methylation in cancerous B‐cell populations , 2010, Journal of cellular biochemistry.

[58]  Pearlly Yan,et al.  Identifying differentially methylated genes using mixed effect and generalized least square models , 2009, BMC Bioinformatics.

[59]  Zhijin Wu,et al.  Subset Quantile Normalization Using Negative Control Features , 2010, J. Comput. Biol..

[60]  Lijun Cheng,et al.  Genetic control of individual differences in gene-specific methylation in human brain. , 2010, American journal of human genetics.

[61]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[62]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[63]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[64]  Xin Zhou,et al.  A statistical framework for Illumina DNA methylation arrays , 2010, Bioinform..

[65]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[66]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[67]  J. McPherson,et al.  Comprehensive DNA methylation profiling in a human cancer genome identifies novel epigenetic targets. , 2006, Carcinogenesis.

[68]  Ralf Herwig,et al.  Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage. , 2010, Genome research.

[69]  Huanming Yang,et al.  The DNA Methylome of Human Peripheral Blood Mononuclear Cells , 2010, PLoS biology.

[70]  M. Esteller,et al.  Epigenetic modifications and human disease , 2010, Nature Biotechnology.

[71]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.