Batch Effects and Pathway Analysis: Two Potential Perils in Cancer Studies Involving DNA Methylation Array Analysis

Background: DNA methylation microarrays have become an increasingly popular means of studying the role of epigenetics in cancer, although the methods used to analyze these arrays are still being developed and existing methods are not always widely disseminated among microarray users. Methods: We investigated two problems likely to confront DNA methylation microarray users: (i) batch effects and (ii) the use of widely available pathway analysis software to analyze results. First, DNA taken from individuals exposed to low and high levels of drinking water arsenic were plated twice on Illumina's Infinium 450 K HumanMethylation Array, once in order of exposure and again following randomization. Second, we conducted simulations in which random CpG sites were drawn from the 450 K array and subjected to pathway analysis using Ingenuity's IPA software. Results: The majority of differentially methylated CpG sites identified in Run One were due to batch effects; few sites were also identified in Run Two. In addition, the pathway analysis software reported many significant associations between our data, randomly drawn from the 450 K array, and various diseases and biological functions. Conclusions: These analyses illustrate the pitfalls of not properly controlling for chip-specific batch effects as well as using pathway analysis software created for gene expression arrays to analyze DNA methylation array data. Impact: We present evidence that (i) chip-specific effects can simulate plausible differential methylation results and (ii) popular pathway analysis software developed for expression arrays can yield spurious results when used in tandem with methylation microarrays. Cancer Epidemiol Biomarkers Prev; 22(6); 1052–60. ©2013 AACR.

[1]  A. Regev,et al.  Distinct physiological states of Plasmodium falciparum in malaria-infected patients , 2007, Nature.

[2]  Jeffrey T Leek,et al.  On the design and analysis of gene expression studies in human populations , 2007, Nature Genetics.

[3]  S. Kim,et al.  Genome-wide methylation analysis identifies involvement of TNF-α mediated cancer pathways in prostate cancer. , 2011, Cancer letters.

[4]  Reply to Wirth et al.: In vivo profiles show continuous variation between 2 cellular populations , 2009, Proceedings of the National Academy of Sciences.

[5]  Andreas Scherer,et al.  Batch Effects and Noise in Microarray Experiments: Sources and Solutions , 2009 .

[6]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[7]  Bernhard Korn,et al.  Tobacco-smoking-related differential DNA methylation: 27K discovery and replication. , 2011, American journal of human genetics.

[8]  Joshua T. Burdick,et al.  Common genetic variants account for differences in gene expression among ethnic groups , 2007, Nature Genetics.

[9]  U. Vogel,et al.  Epigenetic Impact of Long-Term Shiftwork: Pilot Evidence From Circadian Genes and Whole-Genome Methylation Analysis , 2011, Chronobiology international.

[10]  Margaret R Karagas,et al.  DNA methylation array analysis identifies profiles of blood-derived DNA methylation associated with bladder cancer. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[11]  Bekim Sadikovic,et al.  In Vitro Analysis of Integrated Global High-Resolution DNA Methylation Profiling with Genomic Imbalance and Gene Expression in Osteosarcoma , 2008, PloS one.

[12]  Andres Metspalu,et al.  Methylation Markers of Early-Stage Non-Small Cell Lung Cancer , 2012, PloS one.

[13]  Wendy P Robinson,et al.  Evidence for widespread changes in promoter methylation profile in human placenta in response to increasing gestational age and environmental/stochastic factors , 2011, BMC Genomics.

[14]  Anil Potti,et al.  An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[16]  B. Li,et al.  Comparison of different normalization assumptions for analyses of DNA methylation data from the cancer genome. , 2012, Gene.

[17]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[18]  Yi-an Chen,et al.  Sequence overlap between autosomal and sex-linked probes on the Illumina HumanMethylation27 microarray. , 2011, Genomics.

[19]  Xiao Zhang,et al.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis , 2010, BMC Bioinformatics.

[20]  Wolfgang Wagner,et al.  Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. , 2010, Genome research.

[21]  Reid F. Thompson,et al.  Tissue‐specific dysregulation of DNA methylation in aging , 2010, Aging cell.

[22]  Q. Hu,et al.  OSAT: a tool for sample-to-batch allocations in genomics experiments , 2012, BMC Genomics.

[23]  E. Andres Houseman,et al.  Biostatistical Methods in Epigenetic Epidemiology , 2012 .

[24]  Martin J. Aryee,et al.  Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in Rheumatoid Arthritis , 2013, Nature Biotechnology.

[25]  K. V. Donkena,et al.  Batch effect correction for genome-wide methylation data with Illumina Infinium platform , 2011, BMC Medical Genomics.

[26]  Reid F. Thompson,et al.  Cytosine Methylation Dysregulation in Neonates Following Intrauterine Growth Restriction , 2010, PloS one.

[27]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[28]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[29]  Kevin R Coombes,et al.  Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.