Bigmelon: tools for analysing large DNA methylation datasets

Motivation The datasets generated by DNA methylation analyses are getting bigger. With the release of the HumanMethylationEPIC micro‐array and datasets containing thousands of samples, analyses of these large datasets using R are becoming impractical due to large memory requirements. As a result there is an increasing need for computationally efficient methodologies to perform meaningful analysis on high dimensional data. Results Here we introduce the bigmelon R package, which provides a memory efficient workflow that enables users to perform the complex, large scale analyses required in epigenome wide association studies (EWAS) without the need for large RAM. Building on top of the CoreArray Genomic Data Structure file format and libraries packaged in the gdsfmt package, we provide a practical workflow that facilitates the reading‐in, preprocessing, quality control and statistical analysis of DNA methylation data. We demonstrate the capabilities of the bigmelon package using a large dataset consisting of 1193 human blood samples from the Understanding Society: UK Household Longitudinal Study, assayed on the EPIC micro‐array platform. Availability and implementation The bigmelon package is available on Bioconductor (http://bioconductor.org/packages/bigmelon/). The Understanding Society dataset is available at https://www.understandingsociety.ac.uk/about/health/data upon request. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[2]  S. Horvath DNA methylation age of human tissues and cell types , 2013, Genome Biology.

[3]  Thomas Lengauer,et al.  Comprehensive Analysis of DNA Methylation Data with RnBeads , 2014, Nature Methods.

[4]  David Levine,et al.  A high-performance computing toolset for relatedness and principal component analysis of SNP data , 2012, Bioinform..

[5]  P. Eline Slagboom,et al.  MethylAid: visual and interactive quality control of large Illumina 450k datasets , 2014, Bioinform..

[6]  Josine L. Min,et al.  Meffil: efficient normalisation and analysis of very large DNA methylation samples , 2017, bioRxiv.

[7]  Matthew E. Ritchie,et al.  illuminaio: An open source IDAT parsing tool for Illumina microarrays , 2013, F1000Research.

[8]  D. Balding,et al.  Epigenome-wide association studies for common human diseases , 2011, Nature Reviews Genetics.

[9]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[10]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[11]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[12]  M. Esteller,et al.  Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences , 2015, Epigenomics.

[13]  Ruth Pidsley,et al.  A data-driven approach to preprocessing Illumina 450K methylation array data , 2013, BMC Genomics.

[14]  David Levine,et al.  GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies , 2012, Bioinform..

[15]  Martin J. Aryee,et al.  Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in Rheumatoid Arthritis , 2013, Nature Biotechnology.

[16]  Jeffrey T Leek,et al.  Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. , 2012, International journal of epidemiology.

[17]  P. Laird,et al.  Low-level processing of Illumina Infinium DNA Methylation BeadArrays , 2013, Nucleic acids research.

[18]  Daniel R Weinberger,et al.  Mapping DNA methylation across development, genotype, and schizophrenia in the human frontal cortex , 2015, Nature Neuroscience.

[19]  Robert Lowe,et al.  Marmal-aid – a database for Infinium HumanMethylation450 , 2013, BMC Bioinformatics.

[20]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[21]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[22]  T. Ideker,et al.  Genome-wide methylation profiles reveal quantitative views of human aging rates. , 2013, Molecular cell.

[23]  Robin M. Murray,et al.  An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation , 2016, Genome Biology.

[24]  Andrew E. Teschendorff,et al.  ChAMP: 450k Chip Analysis Methylation Pipeline , 2014, Bioinform..