BHap: a novel approach for bacterial haplotype reconstruction

MOTIVATION The bacterial haplotype reconstruction is critical for selecting proper treatments for diseases caused by unknown haplotypes. Existing methods and tools do not work well on this task, because they are usually developed for viral instead of bacterial populations. RESULTS In this study, we developed BHap, a novel algorithm based on fuzzy flow networks, for reconstructing bacterial haplotypes from next generation sequencing data. Tested on simulated and experimental datasets, we showed that BHap was capable of reconstructing haplotypes of bacterial populations with an average F1 score of 0.87, an average precision of 0.87, and an average recall of 0.88. We also demonstrated that BHap had a low susceptibility to sequencing errors, was capable of reconstructing haplotypes with low coverage, and could handle a wide range of mutation rates. Compared with existing approaches, BHap outperformed them in terms of higher F1 scores, better precision, better recall, and more accurate estimation of the number of haplotypes. AVAILABILITY The BHap tool is available at http://www.cs.ucf.edu/∼xiaoman/BHap/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[2]  Ion I. Mandoiu,et al.  Inferring viral quasispecies spectra from 454 pyrosequencing reads , 2011, BMC Bioinformatics.

[3]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[4]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[5]  H. Berg Cold Spring Harbor Symposia on Quantitative Biology.: Vol. LII. Evolution of Catalytic Functions. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1987, ISBN 0-87969-054-2, xix + 955 pp., US $150.00. , 1989 .

[6]  Ying Wang,et al.  MBBC: an efficient approach for metagenomic binning based on clustering , 2015, BMC Bioinformatics.

[7]  J. Montoya-Burgos,et al.  Optimization of de novo transcriptome assembly from next-generation sequencing data. , 2010, Genome research.

[8]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[9]  Mattia C. F. Prosperi,et al.  QuRe: software for viral quasispecies reconstruction from next-generation sequencing data , 2012, Bioinform..

[10]  Kathleen Marchal,et al.  Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations , 2015, Nucleic acids research.

[11]  Michael M. Desai,et al.  Genetic Variation and the Fate of Beneficial Mutations in Asexual Populations , 2011, Genetics.

[12]  David Griffiths,et al.  Detection of Mixed Infection from Bacterial Whole Genome Sequence Data Allows Assessment of Its Role in Clostridium difficile Transmission , 2013, PLoS Comput. Biol..

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  M. Waterman,et al.  Estimating the repeat structure and length of DNA sequences using L-tuples. , 2003, Genome research.

[15]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..