P3BSseq: parallel processing pipeline software for automatic analysis of bisulfite sequencing data

Motivation: Bisulfite sequencing (BSseq) processing is among the most cumbersome next generation sequencing (NGS) applications. Though some BSseq processing tools are available, they are scattered, require puzzling parameters and are running‐time and memory‐usage demanding. Results: We developed P3BSseq, a parallel processing pipeline for fast, accurate and automatic analysis of BSseq reads that trims, aligns, annotates, records the intermediate results, performs bisulfite conversion quality assessment, generates BED methylome and report files following the NIH standards. P3BSseq outperforms the known BSseq mappers regarding running time, computer hardware requirements (processing power and memory use) and is optimized to process the upcoming, extended BSseq reads. We optimized the P3BSseq parameters for directional and non‐directional libraries, and for single‐end and paired‐end reads of Whole Genome and Reduced Representation BSseq. P3BSseq is a user‐friendly streamlined solution for BSseq upstream analysis, requiring only basic computer and NGS knowledge. Availability and Implementation: P3BSseq binaries and documentation are available at: http://sourceforge.net/p/p3bsseq/wiki/Home/ Contact: mararabra@yahoo.co.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Patrick J. Biggs,et al.  SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data , 2010, BMC Bioinformatics.

[2]  R. Stewart,et al.  Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells , 2011, Nature.

[3]  Alberto Policriti,et al.  ERNE-BS5: aligning BS-treated sequences by multiple hits on a 5-letters alphabet , 2012, BCB '12.

[4]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[5]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[6]  Christoph Bock,et al.  RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing , 2012, Bioinform..

[7]  M. Araúzo-Bravo,et al.  Disclosing the crosstalk among DNA methylation, transcription factors, and histone marks in human pluripotent cells through discovery of DNA methylation motifs , 2013, Genome research.

[8]  P. Delvenne,et al.  DNA methylation and cancer diagnosis: new methods and applications , 2009, Expert review of molecular diagnostics.

[9]  Stefano Lonardi,et al.  BRAT-BW: efficient and accurate mapping of bisulfite-treated reads , 2012, Bioinform..

[10]  M. Pellegrini,et al.  Relationship between nucleosome positioning and DNA methylation , 2010, Nature.

[11]  A. Gnirke,et al.  Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis , 2005, Nucleic acids research.

[12]  Pao-Yang Chen,et al.  BS Seeker: precise mapping for bisulfite sequencing , 2010, BMC Bioinformatics.

[13]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[14]  Mark Stitt,et al.  RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics , 2012, Nucleic Acids Res..

[15]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[16]  Brent Pedersen,et al.  MethylCoder: software pipeline for bisulfite-treated sequences , 2011, Bioinform..

[17]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[18]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.

[19]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[20]  A. Franke,et al.  DNA methylome analysis using short bisulfite sequencing data , 2012, Nature Methods.

[21]  Thomas Lengauer,et al.  BiQ Analyzer HiMod: an interactive software tool for high-throughput locus-specific analysis of 5-methylcytosine and its oxidized derivatives , 2014, Nucleic Acids Res..

[22]  Michael Q. Zhang,et al.  Updates to the RMAP short-read mapping software , 2009, Bioinform..

[23]  Carsten O. Daub,et al.  SAMStat: monitoring biases in next generation sequencing data , 2010, Bioinform..

[24]  Stinus Lindgreen,et al.  AdapterRemoval: easy cleaning of next-generation sequencing reads , 2012, BMC Research Notes.

[25]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..