Methodological aspects of whole-genome bisulfite sequencing analysis

The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation beyond CpG sites and CpG islands. These technologies have opened new avenues to understand the interplay between epigenetic events, chromatin plasticity and gene regulation. However, the processing, managing and mining of this huge volume of data require specialized computational tools and statistical methods that are yet to be standardized. Here, we describe a complete bisulfite sequencing analysis workflow, including recently developed programs, highlighting each of the crucial analysis steps required, i.e. sequencing quality control, reads alignment, methylation scoring, methylation heterogeneity assessment, genomic features annotation, data visualization and determination of differentially methylated cytosines. Moreover, we discuss the limitations of these technologies and considerations to perform suitable analyses.

[1]  Kevin C. Dorff,et al.  GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data , 2013, PloS one.

[2]  Yuriy Fofanov,et al.  PIQA: pipeline for Illumina G1 genome analyzer data quality assessment , 2009, Bioinform..

[3]  J. Long,et al.  Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data , 2012, BMC Genomics.

[4]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[5]  J. Martín-Subero,et al.  Intragenic DNA methylation in transcriptional regulation, normal differentiation and cancer. , 2013, Biochimica et biophysica acta.

[6]  Martin J Aryee,et al.  Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts , 2009, Nature Genetics.

[7]  Xiaoqing Yu,et al.  MethyQA: a pipeline for bisulfite-treated methylation sequencing quality assessment , 2013, BMC Bioinformatics.

[8]  A. Ferguson-Smith,et al.  Mechanisms regulating imprinted genes in clusters. , 2007, Current opinion in cell biology.

[9]  M. Soares,et al.  Genome-wide quantitative assessment of variation in DNA methylation patterns , 2011, Nucleic acids research.

[10]  A. Klindworth,et al.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies , 2012, Nucleic acids research.

[11]  M. Butler Genomic imprinting disorders in humans: a mini-review , 2009, Journal of Assisted Reproduction and Genetics.

[12]  Matthew D Dean,et al.  Genomic landscape of human allele-specific DNA methylation , 2012, Proceedings of the National Academy of Sciences.

[13]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[14]  E. Koonin,et al.  Differences in DNA methylation between human neuronal and glial cells are concentrated in enhancers and non-CpG sites , 2013, Nucleic acids research.

[15]  J. Rogers,et al.  DNA methylation profiling of human chromosomes 6, 20 and 22 , 2006, Nature Genetics.

[16]  Hikoya Hayatsu,et al.  Discovery of bisulfite-mediated cytosine conversion to uracil, the key reaction for DNA methylation analysis--a personal account. , 2008, Proceedings of the Japan Academy. Series B, Physical and biological sciences.

[17]  M. Ehrlich,et al.  Comparison of bisulfite modification of 5-methyldeoxycytidine and deoxycytidine residues. , 1980, Nucleic acids research.

[18]  Michael Q. Zhang,et al.  Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells , 2013, Nucleic acids research.

[19]  Pearlly Yan,et al.  Enrichment-based DNA methylation analysis using next-generation sequencing: sample exclusion, estimating changes in global methylation, and the contribution of replicate lanes , 2012, BMC Genomics.

[20]  Matthew D. Schultz,et al.  Global Epigenomic Reconfiguration During Mammalian Brain Development , 2013, Science.

[21]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[22]  C. Plass,et al.  Methylation of Adjacent CpG Sites Affects Sp1/Sp3 Binding and Activity in the p21Cip1 Promoter , 2003, Molecular and Cellular Biology.

[23]  Chia-Lin Wei,et al.  Dynamic changes in the human methylome during differentiation. , 2010, Genome research.

[24]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[25]  Howard Cedar,et al.  DNA methylation dynamics in health and disease , 2013, Nature Structural &Molecular Biology.

[26]  A. Parle‐McDermott,et al.  DNA Methylation: A Timeline of Methods and Applications , 2011, Front. Gene..

[27]  Svend K. Petersen-Mahrt,et al.  5-Methylcytosine DNA demethylation: more than losing a methyl group. , 2012, Annual review of genetics.

[28]  S. Henikoff,et al.  Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription , 2007, Nature Genetics.

[29]  Eurie L. Hong,et al.  Annotation of functional variation in personal genomes using RegulomeDB , 2012, Genome research.

[30]  Mukesh Jain,et al.  NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data , 2012, PloS one.

[31]  Ann E. Loraine,et al.  The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets , 2009, Bioinform..

[32]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[33]  W. Sung,et al.  BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation , 2012, Genome Biology.

[34]  Israel Steinfeld,et al.  Developmental programming of CpG island methylation profiles in the human genome , 2009, Nature Structural &Molecular Biology.

[35]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[36]  C. Bock Analysing and interpreting DNA methylation data , 2012, Nature Reviews Genetics.

[37]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[38]  Stefano Lonardi,et al.  BRAT-BW: efficient and accurate mapping of bisulfite-treated reads , 2012, Bioinform..

[39]  Julie A. Law,et al.  Establishing, maintaining and modifying DNA methylation patterns in plants and animals , 2010, Nature Reviews Genetics.

[40]  M. Pellegrini,et al.  Relationship between nucleosome positioning and DNA methylation , 2010, Nature.

[41]  R. Meehan,et al.  Enzymatic approaches and bisulfite sequencing cannot distinguish between 5-methylcytosine and 5-hydroxymethylcytosine in DNA. , 2010, BioTechniques.

[42]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[43]  G. Hon,et al.  Base-Resolution Analysis of 5-Hydroxymethylcytosine in the Mammalian Genome , 2012, Cell.

[44]  Y. Okuno,et al.  Down-regulation of PU.1 by methylation of distal regulatory elements and the promoter is required for myeloma cell growth. , 2007, Cancer research.

[45]  S. Apostolidou,et al.  Imprinted genes and their role in human fetal growth , 2006, Cytogenetic and Genome Research.

[46]  R. Kitazawa,et al.  Methylation adjacent to negatively regulating AP-1 site reactivates TrkA gene expression during cancer progression , 2005, Oncogene.

[47]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[48]  Huanming Yang,et al.  The DNA Methylome of Human Peripheral Blood Mononuclear Cells , 2010, PLoS biology.

[49]  Alfonso Valencia,et al.  RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses , 2013, Bioinform..

[50]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[51]  Jian Xu,et al.  QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data , 2013, PloS one.

[52]  Axel Schumacher,et al.  A high-throughput DNA methylation analysis of a single cell , 2011, Nucleic acids research.

[53]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[54]  J. Mathers,et al.  Standardization and quality controls for the methylated DNA immunoprecipitation technique , 2012, Epigenetics.

[55]  Vladimir Makarov,et al.  AnnTools: a comprehensive and versatile annotation toolkit for genomic variants , 2012, Bioinform..

[56]  Hehuang Xie,et al.  DMEAS: DNA methylation entropy analysis software , 2013, Bioinform..

[57]  A. Ferguson-Smith,et al.  Mammalian genomic imprinting. , 2011, Cold Spring Harbor perspectives in biology.

[58]  Peter F. Stadler,et al.  Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures , 2009, PLoS Comput. Biol..

[59]  Ruchir R. Shah,et al.  DNA methylation prevents CTCF-mediated silencing of the oncogene BCL6 in B cell lymphomas , 2010, The Journal of experimental medicine.

[60]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[61]  Michael Q. Zhang,et al.  Updates to the RMAP short-read mapping software , 2009, Bioinform..

[62]  Wei Li,et al.  BSeQC: quality control of bisulfite sequencing experiments , 2013, Bioinform..

[63]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[64]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[65]  Bernhard Korn,et al.  Tobacco-smoking-related differential DNA methylation: 27K discovery and replication. , 2011, American journal of human genetics.

[66]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[67]  S. Balasubramanian,et al.  Quantitative Sequencing of 5-Methylcytosine and 5-Hydroxymethylcytosine at Single-Base Resolution , 2012, Science.

[68]  Kan Liu,et al.  BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data , 2011, Genom. Proteom. Bioinform..

[69]  Pao-Yang Chen,et al.  BS Seeker: precise mapping for bisulfite sequencing , 2010, BMC Bioinformatics.

[70]  Caixia Yu,et al.  WBSA: Web Service for Bisulfite Sequencing Data Analysis , 2014, PloS one.

[71]  B. Ren,et al.  Base-Resolution Analyses of Sequence and Parent-of-Origin Dependent DNA Methylation in the Mouse Genome , 2012, Cell.

[72]  Stefano Lonardi,et al.  BRAT: bisulfite-treated reads analysis tool , 2010, Bioinform..

[73]  Tyler H. Garvin,et al.  A Reference Methylome Database and Analysis Pipeline to Facilitate Integrative and Comparative Epigenomics , 2013, PloS one.

[74]  Brent Pedersen,et al.  MethylCoder: software pipeline for bisulfite-treated sequences , 2011, Bioinform..

[75]  Madeleine P. Ball,et al.  Targeted and genome-scale methylomics reveals gene body signatures in human cell lines , 2009, Nature Biotechnology.

[76]  O. el-Maarri,et al.  Methods: DNA methylation. , 2003, Advances in experimental medicine and biology.

[77]  Matthew D. Schultz,et al.  'Leveling' the playing field for analyses of single-base resolution DNA methylomes. , 2012, Trends in genetics : TIG.

[78]  Young-Joon Kim,et al.  Nucleosome deposition and DNA methylation at coding region boundaries , 2009, Genome Biology.

[79]  Fang Wang,et al.  CpG_MPs: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data , 2012, Nucleic acids research.

[80]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[81]  W. Reik,et al.  Uncovering the role of 5-hydroxymethylcytosine in the epigenome , 2011, Nature Reviews Genetics.

[82]  N. Heintz,et al.  The Nuclear DNA Base 5-Hydroxymethylcytosine Is Present in Purkinje Neurons and the Brain , 2009, Science.

[83]  David R. Liu,et al.  Conversion of 5-Methylcytosine to 5- Hydroxymethylcytosine in Mammalian DNA by the MLL Partner TET1 , 2009 .

[84]  J. Qian,et al.  DNA methylation presents distinct binding sites for human transcription factors , 2013, eLife.

[85]  Klaus Schulten,et al.  Detection and Quantification of Methylation in DNA using Solid-State Nanopores , 2013, Scientific Reports.

[86]  R. Stewart,et al.  Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells , 2011, Nature.

[87]  M. Gut,et al.  Supplemental information for : “ CpG islands and GC content dictate nucleosome depletion in a transcription independent manner at mammalian promoters ” , 2012 .

[88]  P. Jones,et al.  The DNA methylation paradox. , 1999, Trends in genetics : TIG.

[89]  Jun Wu,et al.  HTQC: a fast quality control toolkit for Illumina sequencing data , 2013, BMC Bioinformatics.

[90]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[91]  S. Rafii,et al.  Directional DNA methylation changes and complex intermediate states accompany lineage specificity in the adult hematopoietic compartment. , 2011, Molecular cell.

[92]  Michael Q. Zhang,et al.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data , 2013, BMC Genomics.

[93]  Francine E. Garrett-Bakelman,et al.  methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles , 2012, Genome Biology.

[94]  M. Gonzalo Claros,et al.  SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read , 2010, BMC Bioinformatics.

[95]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[96]  P. Waddell,et al.  Effects of DNA methylation on nucleosome stability , 2013, Nucleic acids research.

[97]  Giorgio Valle,et al.  PASS-bis: a bisulfite aligner suitable for whole methylome analysis of Illumina and SOLiD reads , 2013, Bioinform..

[98]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[99]  T. Benoukraf,et al.  GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data , 2012, Nucleic acids research.

[100]  A. Chess,et al.  Gene Body-Specific Methylation on the Active X Chromosome , 2007, Science.

[101]  Guoping Fan,et al.  Distribution, recognition and regulation of non-CpG methylation in the adult mammalian brain , 2013, Nature Neuroscience.

[102]  I. Derrington,et al.  Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA , 2013, Proceedings of the National Academy of Sciences.

[103]  Y. Tarutani,et al.  Monoallelic gene expression and its mechanisms. , 2011, Current opinion in plant biology.

[104]  Patrick J. Biggs,et al.  SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data , 2010, BMC Bioinformatics.