Amplikyzer: Automated methylation analysis of amplicons from bisulfite flowgram sequencing

The Roche 454 GS Junior sequencing platform allows locus-specific DNA methylation analysis using deep bisulfite amplicon sequencing. However, bisulfite-converted DNA reads may contain long T homopolymers, and the main sources of errors on pyrosequencing platforms are homopolymer overand undercalls. Furthermore, existing tools do not always meet the analysis requirements for complex assay designs with multiple regions of interest (ROIs) from multiple samples. We have developed the amplikyzer software package to address the above challenges. It directly aligns the intensity sequences from standard flowgram files (SFF format) to given amplicon reference sequences, without converting to nucleotide FASTA format first, avoiding information loss by rounding flow intensities, and taking special measures to correctly process long homopolymers. It offers a variety of options to analyze complex multiplexed samples with several regions of interest and outputs useful statistics and publication-quality analysis plots without mandatory manual interaction. This allows our software to be used as part of automated pipelines as well as interactively. The underlying analysis algorithms, using a novel hybrid flowgram-DNA sequence representation are described in detail. We also discuss configuration options and use cases of our open source amplikyzer software and present exemplary results. The software, including required libraries, is available at https://bitbucket.org/ svenrahmann/amplikyzer/downloads. Contact: Sven.Rahmann[at]uni-due.de

[1]  Sven Rahmann Fast and sensitive probe selection for DNA chips using jumps in matching statistics , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[2]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[3]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[4]  Dong Xu,et al.  Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. , 2007, Cancer research.

[5]  Thomas Lengauer,et al.  BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing , 2005, Bioinform..

[6]  Albert Jeltsch,et al.  Bisulfite sequencing Data Presentation and Compilation (BDPC) web server—a useful tool for DNA methylation analysis , 2008, Nucleic acids research.

[7]  O. Bruland,et al.  Evidence for anticipation in Beckwith–Wiedemann syndrome , 2013, European Journal of Human Genetics.

[8]  Albert Jeltsch,et al.  BISMA - Fast and accurate bisulfite sequencing data analysis of individual clones from unique and repetitive sequences , 2010, BMC Bioinformatics.

[9]  Thomas Lengauer,et al.  BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing , 2011, Nucleic Acids Res..

[10]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[11]  Björn Andersson,et al.  FAAST: Flow-space Assisted Alignment Search Tool , 2011, BMC Bioinformatics.

[12]  Sven Rahmann,et al.  Aligning Flowgrams to DNA Sequences , 2013, GCB.

[13]  R. Siebert,et al.  Deep Bisulfite Sequencing of Aberrantly Methylated Loci in a Patient with Multiple Methylation Defects , 2013, PloS one.

[14]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[15]  M. Cubellis,et al.  The molecular function and clinical phenotype of partial deletions of the IGF2/H19 imprinting control region depends on the spatial arrangement of the remaining CTCF-binding sites , 2012, Human molecular genetics.

[16]  Brent Pedersen,et al.  MethylCoder: software pipeline for bisulfite-treated sequences , 2011, Bioinform..

[17]  Sven Rahmann,et al.  Massive parallel bisulfite sequencing of CG-rich DNA fragments reveals that methylation of many X-chromosomal CpG islands in female blood DNA is incomplete. , 2009, Human molecular genetics.

[18]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[19]  Vladimir Vacic,et al.  A probabilistic method for small RNA flowgram matching. , 2007, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[20]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.