Meffil: efficient normalization and analysis of very large DNA methylation datasets

Abstract Motivation DNA methylation datasets are growing ever larger both in sample size and genome coverage. Novel computational solutions are required to efficiently handle these data. Results We have developed meffil, an R package designed for efficient quality control, normalization and epigenome-wide association studies of large samples of Illumina Methylation BeadChip microarrays. A complete re-implementation of functional normalization minimizes computational memory without increasing running time. Incorporating fixed and random effects within functional normalization, and automated estimation of functional normalization parameters reduces technical variation in DNA methylation levels, thus reducing false positive rates and improving power. Support for normalization of datasets distributed across physically different locations without needing to share biologically-based individual-level data means that meffil can be used to reduce heterogeneity in meta-analyses of epigenome-wide association studies. Availability and implementation https://github.com/perishky/meffil/ Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[2]  Kimberly D. Siegmund,et al.  An evaluation of processing methods for HumanMethylation450 BeadChip data , 2016, BMC Genomics.

[3]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[4]  Per Magnus,et al.  Cell type specific DNA methylation in cord blood: A 450K-reference data set and cell count-based validation of estimated cell type composition , 2016, Epigenetics.

[5]  P. Laird,et al.  Low-level processing of Illumina Infinium DNA Methylation BeadArrays , 2013, Nucleic acids research.

[6]  Matthew E. Ritchie,et al.  illuminaio: An open source IDAT parsing tool for Illumina microarrays , 2013, F1000Research.

[7]  Charles Auffray,et al.  DNA Methylation in Newborns and Maternal Smoking in Pregnancy: Genome-wide Consortium Meta-analysis. , 2016, American journal of human genetics.

[8]  Hermann Brenner,et al.  Between-array normalization for 450K data , 2015, Front. Genet..

[9]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[10]  Oliver Butters,et al.  DataSHIELD: taking the analysis to the data, not the data to the analysis , 2014, International journal of epidemiology.

[11]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[12]  P. Eline Slagboom,et al.  MethylAid: visual and interactive quality control of large Illumina 450k datasets , 2014, Bioinform..

[13]  Allyson L. Lister,et al.  Data Resource Profile: Accessible Resource for Integrated Epigenomic Studies (ARIES). , 2015, International journal of epidemiology.

[14]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[15]  Gerta Rücker,et al.  Bmc Medical Research Methodology Open Access Undue Reliance on I 2 in Assessing Heterogeneity May Mislead , 2022 .

[16]  Thomas Lengauer,et al.  Comprehensive Analysis of DNA Methylation Data with RnBeads , 2014, Nature Methods.

[17]  Rafael A. Irizarry,et al.  quantro: a data-driven approach to guide the choice of an appropriate normalization method , 2015, Genome Biology.

[18]  Francesco Marabita,et al.  A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data , 2012, Bioinform..

[19]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  Volkhard Helms,et al.  BEclear: Batch Effect Detection and Adjustment in DNA Methylation Data , 2016, PloS one.

[22]  Qihua Tan,et al.  Efficient detection of differentially methylated regions using DiMmeR , 2016, Bioinform..

[23]  David Levine,et al.  A high-performance computing toolset for relatedness and principal component analysis of SNP data , 2012, Bioinform..

[24]  Michael S. Kobor,et al.  Nucleated red blood cells impact DNA methylation and expression analyses of cord blood hematopoietic cells , 2015, Clinical Epigenetics.

[25]  Terence P. Speed,et al.  Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data , 2015, bioRxiv.

[26]  P. Elliott,et al.  A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies , 2015, Genome Biology.

[27]  Carrie V. Breton,et al.  Maternal smoking in pregnancy and DNA methylation in newborns: Genome-wide consortium meta-analysis , 2016 .

[28]  K. Hansen,et al.  Functional normalization of 450k methylation array data improves replication in large cancer studies , 2014, Genome Biology.

[29]  Torben Hansen,et al.  Genome-Wide Population-Based Association Study of Extremely Overweight Young Adults – The GOYA Study , 2011, PloS one.

[30]  J. Kere,et al.  Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility , 2012, PloS one.

[31]  Alexandra M. Binder,et al.  Recommendations for the design and analysis of epigenome-wide association studies , 2013, Nature Methods.

[32]  Andrew E. Teschendorff,et al.  Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies , 2011, Bioinform..

[33]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[34]  Jean-Philippe Fortin,et al.  Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi , 2016, bioRxiv.

[35]  J. Tost,et al.  Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. , 2012, Epigenomics.

[36]  Jean-Philippe Fortin,et al.  shinyMethyl: interactive quality control of Illumina 450k DNA methylation arrays in R. , 2014, F1000Research.

[37]  Aurélie Labbe,et al.  An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies , 2015, Genome Biology.

[38]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[39]  David Levine,et al.  SeqArray—a storage‐efficient high‐performance data format for WGS variant calls , 2017, Bioinform..

[40]  Andrew E. Teschendorff,et al.  ChAMP: 450k Chip Analysis Methylation Pipeline , 2014, Bioinform..

[41]  J. Michael Cherry,et al.  Principles of metadata organization at the ENCODE data coordination center , 2016, Database J. Biol. Databases Curation.

[42]  Shan V Andrews,et al.  DNA methylation of cord blood cell types: Applications for mixed cell birth studies , 2016, Epigenetics.

[43]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..