BSeQC: quality control of bisulfite sequencing experiments

MOTIVATION Bisulfite sequencing (BS-seq) has emerged as the gold standard to study genome-wide DNA methylation at single-nucleotide resolution. Quality control (QC) is a critical step in the analysis pipeline to ensure that BS-seq data are of high quality and suitable for subsequent analysis. Although several QC tools are available for next-generation sequencing data, most of them were not designed to handle QC issues specific to BS-seq protocols. Therefore, there is a strong need for a dedicated QC tool to evaluate and remove potential technical biases in BS-seq experiments. RESULTS We developed a package named BSeQC to comprehensively evaluate the quality of BS-seq experiments and automatically trim nucleotides with potential technical biases that may result in inaccurate methylation estimation. BSeQC takes standard SAM/BAM files as input and generates bias-free SAM/BAM files for downstream analysis. Evaluation based on real BS-seq data indicates that the use of the bias-free SAM/BAM file substantially improves the quantification of methylation level. AVAILABILITY AND IMPLEMENTATION BSeQC is freely available at: http://code.google.com/p/bseqc/.

[1]  B. Langmead,et al.  BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions , 2012, Genome Biology.

[2]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[3]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[4]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[5]  Vijay K. Tiwari,et al.  DNA-binding factors shape the mouse methylome at distal regulatory regions , 2011, Nature.

[6]  P. Laird,et al.  Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina–associated domains , 2011, Nature Genetics.

[7]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[8]  P. Laird,et al.  Bis-SNP: Combined DNA methylation and SNP calling for Bisulfite-seq data , 2012, Genome Biology.

[9]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[10]  Thomas Lengauer,et al.  Improving HIV coreceptor usage prediction in the clinic using hints from next-generation sequencing data , 2012, Bioinform..

[11]  K. Robertson DNA methylation and human disease , 2005, Nature Reviews Genetics.

[12]  Oscar Flores,et al.  htSeqTools: high-throughput sequencing quality control, processing and visualization in R , 2012, Bioinform..

[13]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[14]  C. Bock Analysing and interpreting DNA methylation data , 2012, Nature Reviews Genetics.