DNA methylation data by sequencing: experimental approaches and recommendations for tools and pipelines for data analysis

Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms—from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions. The objective of this review is to describe basic principles and steps in the analysis of DNA methylation sequencing data that in particular have been used for mammalian genomes, and more importantly to present and discuss the most pronounced computational pipelines that can be used to analyze such data. We aim to provide a good starting point for scientists with limited experience in computational analyses of DNA methylation and hydroxymethylation data, and recommend a few tools that are powerful, but still easy enough to use for their own data analysis.

[1]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[2]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[3]  Yan Lu,et al.  A comprehensive evaluation of alignment software for reduced representation bisulfite sequencing data , 2018, Bioinform..

[4]  Wei Li,et al.  MOABS: model based analysis of bisulfite sequencing data , 2014, Genome Biology.

[5]  B. Langmead,et al.  BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions , 2012, Genome Biology.

[6]  Touati Benoukraf,et al.  Methodological aspects of whole-genome bisulfite sequencing analysis , 2015, Briefings Bioinform..

[7]  G. Hon,et al.  Base-Resolution Analysis of 5-Hydroxymethylcytosine in the Mammalian Genome , 2012, Cell.

[8]  Yue Wang,et al.  BioMethyl: an R package for biological interpretation of DNA methylation data , 2019, Bioinform..

[9]  Pao-Yang Chen,et al.  BS-Seeker3: ultrafast pipeline for bisulfite sequencing , 2018, BMC Bioinformatics.

[10]  S. Clark,et al.  Methyl-CpG-binding domain proteins: readers of the epigenome. , 2015, Epigenomics.

[11]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[12]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[13]  Yang Shi,et al.  Genome-wide comparison of DNA hydroxymethylation in mouse embryonic stem cells and neural progenitor cells by a new comparative hMeDIP-seq method , 2013, Nucleic acids research.

[14]  Yongseok Park,et al.  MethylSig: a whole genome DNA methylation analysis pipeline , 2014, Bioinform..

[15]  Natalie Jäger,et al.  Genome-wide mapping of DNA methylation: a quantitative technology comparison , 2010, Nature Biotechnology.

[16]  Michael Weber,et al.  Functions of DNA methylation and hydroxymethylation in mammalian development. , 2013, Current topics in developmental biology.

[17]  L. E. McDonald,et al.  A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Pietro Liò,et al.  Opportunities for community awareness platforms in personal genomics and bioinformatics education , 2016, Briefings Bioinform..

[19]  Wei Jiang,et al.  High-throughput DNA methylation profiling using universal bead arrays. , 2006, Genome research.

[20]  Y. Dor,et al.  Principles of DNA methylation and their implications for biology and medicine , 2018, The Lancet.

[21]  Pao-Yang Chen,et al.  Profiling genome-wide DNA methylation , 2016, Epigenetics & Chromatin.

[22]  G. Veenstra,et al.  DNA methylation and methyl-CpG binding proteins: developmental requirements and function , 2009, Chromosoma.

[23]  Keith D Robertson,et al.  DNA methyltransferases, DNA damage repair, and cancer. , 2013, Advances in experimental medicine and biology.

[24]  Alexis B. Carter,et al.  Considerations for Genomic Data Privacy and Security when Working in the Cloud. , 2019, The Journal of molecular diagnostics : JMD.

[25]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[26]  S. Michel,et al.  MeQA: a pipeline for MeDIP-seq data quality assessment and analysis , 2012, Bioinform..

[27]  E. V. Veen Observational health research in Europe: understanding the General Data Protection Regulation and underlying debate. , 2018 .

[28]  Stephan Beck,et al.  Methylome analysis using MeDIP-seq with low DNA concentrations , 2012, Nature Protocols.

[29]  Cathryn M. Gould,et al.  Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA , 2017, Epigenetics & Chromatin.

[30]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[31]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[32]  Christopher A. Miller,et al.  Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing , 2010, BMC Bioinformatics.

[33]  T. Benoukraf,et al.  GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data , 2012, Nucleic acids research.

[34]  Christoph Bock,et al.  RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing , 2012, Bioinform..

[35]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[36]  Björn Grüning,et al.  Strategies for analyzing bisulfite sequencing data , 2017, bioRxiv.

[37]  S. Beck,et al.  Computational Analysis and Integration of MeDIP-seq Methylome Data , 2016 .

[38]  Moray J. Campbell,et al.  The Genomic Impact of DNA CpG Methylation on Gene Expression; Relationships in Prostate Cancer , 2017, Biomolecules.

[39]  C. Bock Analysing and interpreting DNA methylation data , 2012, Nature Reviews Genetics.

[40]  Stefano Lonardi,et al.  BRAT-BW: efficient and accurate mapping of bisulfite-treated reads , 2012, Bioinform..

[41]  C. Plass,et al.  Pan-cancer patterns of DNA methylation , 2014, Genome Medicine.

[42]  Ivo L. Hofacker,et al.  AREsite2: an enhanced database for the comprehensive investigation of AU/GU/U-rich elements , 2015, Nucleic Acids Res..

[43]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[44]  A. Parle‐McDermott,et al.  DNA Methylation: A Timeline of Methods and Applications , 2011, Front. Gene..

[45]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[46]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[47]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Reiner Schulz,et al.  Resources for methylome analysis suitable for gene knockout studies of potential epigenome modifiers , 2012, GigaScience.

[49]  P. Gluckman,et al.  Comparison of Methyl-capture Sequencing vs. Infinium 450K methylation array for methylome analysis in clinical samples , 2016, Epigenetics.

[50]  Colm E. Nestor,et al.  Hydroxymethylated DNA immunoprecipitation (hmeDIP). , 2014, Methods in molecular biology.

[51]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[52]  E. Li,et al.  DNA methylation of intragenic CpG islands depends on their transcriptional activity during differentiation and disease , 2017, Proceedings of the National Academy of Sciences.

[53]  Zachary D. Smith,et al.  Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling , 2011, Nature Protocols.

[54]  Emily Chia-Yu Su,et al.  Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms , 2018, BMC Bioinformatics.

[55]  A. Milosavljevic,et al.  Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing , 2014, Nucleic acids research.

[56]  D. Haber,et al.  DNA Methyltransferases Dnmt3a and Dnmt3b Are Essential for De Novo Methylation and Mammalian Development , 1999, Cell.

[57]  A. Bird,et al.  CpG islands and the regulation of transcription. , 2011, Genes & development.

[58]  Kiyoshi Asai,et al.  A mostly traditional approach improves alignment of bisulfite-converted DNA , 2012, Nucleic acids research.

[59]  Daiya Takai,et al.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[60]  T. Bianco-Miotto,et al.  msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data , 2018, Scientific Reports.

[61]  Florentino Fernández Riverola,et al.  Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data , 2018, Bioinform..

[62]  Tyler H. Garvin,et al.  A Reference Methylome Database and Analysis Pipeline to Facilitate Integrative and Comparative Epigenomics , 2013, PloS one.

[63]  S. Balasubramanian,et al.  Quantitative Sequencing of 5-Methylcytosine and 5-Hydroxymethylcytosine at Single-Base Resolution , 2012, Science.

[64]  R. Sinha,et al.  Hydroxymethylation of DNA: an epigenetic marker , 2014, EXCLI journal.

[65]  Lars Bolund,et al.  SMAP: a streamlined methylation analysis pipeline for bisulfite sequencing , 2015, GigaScience.

[66]  E. Van Veen Observational health research in Europe: understanding the General Data Protection Regulation and underlying debate. , 2018, European journal of cancer.

[67]  D. Barros-Silva,et al.  Profiling DNA Methylation Based on Next-Generation Sequencing Approaches: New Insights and Clinical Applications , 2018, Genes.