Significance analysis and statistical dissection of variably methylated regions.

It has recently been proposed that variation in DNA methylation at specific genomic locations may play an important role in the development of complex diseases such as cancer. Here, we develop 1- and 2-group multiple testing procedures for identifying and quantifying regions of DNA methylation variability. Our method is the first genome-wide statistical significance calculation for increased or differential variability, as opposed to the traditional approach of testing for mean changes. We apply these procedures to genome-wide methylation data obtained from biological and technical replicates and provide the first statistical proof that variably methylated regions exist and are due to interindividual variation. We also show that differentially variable regions in colon tumor and normal tissue show enrichment of genes regulating gene expression, cell morphogenesis, and development, supporting a biological role for DNA methylation variability in cancer.

[1]  James LaRue,et al.  Integrated software , 1993 .

[2]  Gilbert T. Walker,et al.  On Periodicity in Series of Related Terms , 1931 .

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  G. U. Yule,et al.  The Foundations of Econometric Analysis: On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers ( Philosophical Transactions of the Royal Society of London , A, vol. 226, 1927, pp. 267–73) , 1995 .

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  J D Watson,et al.  Nonparametric Analysis of Statistic Images from Functional Mapping Experiments , 1996, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[7]  J. Leek Asymptotic Conditional Singular Value Decomposition for High‐Dimensional Genomic Data , 2011, Biometrics.

[8]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[9]  K. Kinzler,et al.  DNA methylation and genetic instability in colorectal cancer cells. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[11]  D. Haber,et al.  DNA Methyltransferases Dnmt3a and Dnmt3b Are Essential for De Novo Methylation and Mammalian Development , 1999, Cell.

[12]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[13]  Jeffrey T Leek,et al.  A general framework for multiple testing dependence , 2008, Proceedings of the National Academy of Sciences.

[14]  R. Myers,et al.  An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[15]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[16]  Martin J. Aryee,et al.  Personalized Epigenomic Signatures That Are Stable Over Time and Covary with Body Mass Index , 2010, Science Translational Medicine.

[17]  John D. Storey,et al.  Statistical Significance for Genome-Wide Studies , 2003 .

[18]  Wei Li,et al.  Model-based analysis of two-color arrays (MA2C) , 2007, Genome Biology.

[19]  T. Breurch,et al.  A simple test for heteroscedasticity and random coefficient variation (econometrica vol 47 , 1979 .

[20]  H. Levene Robust tests for equality of variances , 1961 .

[21]  A. Feinberg,et al.  The history of cancer epigenetics , 2004, Nature Reviews Cancer.

[22]  Rafael A Irizarry,et al.  Comprehensive high-throughput arrays for relative methylation (CHARM). , 2008, Genome research.

[23]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[25]  R. Adler,et al.  Peak Detection as Multiple Testing , 2010, 1008.1924.

[26]  John D. Storey A direct approach to false discovery rates , 2002 .

[27]  A. Feinberg,et al.  Stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease , 2010, Proceedings of the National Academy of Sciences.

[28]  A. Feinberg,et al.  Loss of IGF2 Imprinting: A Potential Marker of Colorectal Cancer Risk , 2003, Science.

[29]  Clifford A. Meyer,et al.  Model-based analysis of tiling-arrays for ChIP-chip , 2006, Proceedings of the National Academy of Sciences.

[30]  M. Esteller,et al.  Epigenetic modifications and human disease , 2010, Nature Biotechnology.