PHYSER: An Algorithm to Detect Sequencing Errors from Phylogenetic Information

Sequencing errors can be difficult to detect due to the high rate of production of new data, which makes manual curation unfeasible. To address these shortcomings we have developed a phylogenetic inspired algorithm to assess the quality of new sequences given a related phylogeny. Its performance and efficiency have been evaluated with human mitochondrial DNA data.

[1]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[2]  H. Bandelt,et al.  Human Mitochondrial DNA and the Evolution of Homo sapiens , 2006 .

[3]  Yong Wang,et al.  Genome Sequencing in Open Microfabricated High Density Picoliter Reactors , 2005 .

[4]  E. Mardis,et al.  Genome Sequencing Technology and Algorithms , 2007 .

[5]  Aysam Guerler,et al.  GIS: a comprehensive source for protein structure similarities , 2010, Nucleic Acids Res..

[6]  D. Turnbull,et al.  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA , 1999, Nature Genetics.

[7]  Pierre Baldi,et al.  An enhanced MITOMAP with a global mtDNA mutational phylogeny , 2006, Nucleic Acids Res..

[8]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[9]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[10]  M. Zeviani,et al.  The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool. , 2004, American journal of human genetics.

[11]  Manfred Kayser,et al.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation , 2009, Human mutation.

[12]  E. Virginia Armbrust,et al.  pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree , 2010, BMC Bioinformatics.

[13]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[14]  Juan M. Corchado,et al.  Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, 10th International Work-Conference on Artificial Neural Networks, IWANN 2009 Workshops, Salamanca, Spain, June 10-12, 2009. Proceedings, Part II , 2009, IWANN.

[15]  R Trivedi,et al.  Phylogeny and antiquity of M macrohaplogroup inferred from complete mt DNA sequence of Indian specific lineages , 2005, BMC Evolutionary Biology.

[16]  Ross A. Overbeek,et al.  The ribosomal database project , 1992, Nucleic Acids Res..

[17]  Roberto Blanco,et al.  ZARAMIT: A System for the Evolutionary Study of Human Mitochondrial DNA , 2009, IWANN.