A New Linear-Time Heuristic Algorithm for Computing the Parsimony Score of Phylogenetic Networks: Theoretical Bounds and Empirical Performance

Phylogenies play a major role in representing the interrelationships among biological entities. Many methods for reconstructing and studying such phylogenies have been proposed, almost all of which assume that the underlying history of a given set of species can be represented by a binary tree. Although many biological processes can be effectively modeled and summarized in this fashion, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks, rather than trees, of relationships. In a series of papers, we have extended the maximum parsimony (MP) criterion to phylogenetic networks, demonstrated its appropriateness, and established the intractability of the problem of scoring the parsimony of a phylogenetic network. In this work we show the hardness of approximation for the general case of the problem, devise a very fast (linear-time) heuristic algorithm for it, and implement it on simulated as well as biological data.

[1]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[2]  T. D. Read,et al.  Role of Mobile DNA in the Evolution of Vancomycin-Resistant Enterococcus faecalis , 2003, Science.

[3]  R. Olmstead,et al.  A survey of tricolpate (eudicot) phylogenetic relationships. , 2004, American journal of botany.

[4]  Reuven Bar-Yehuda,et al.  A Local-Ratio Theorem for Approximating the Weighted Vertex Cover Problem , 1983, WG.

[5]  Wing-Kin Sung,et al.  Reconstructing Recombination Network from Sequence Data: The Small Parsimony Problem , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  J. Lake,et al.  Horizontal gene transfer in microbial genome evolution. , 2002, Theoretical population biology.

[7]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[8]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[9]  J A Eisen,et al.  Assessing evolutionary relationships among microbes from whole-genome analysis. , 2000, Current opinion in microbiology.

[10]  Vineet Bafna,et al.  Improved Recombination Lower Bounds for Haplotype Data , 2005, RECOMB.

[11]  Daniel H. Huson,et al.  Reconstruction of Reticulate Networks from Gene Trees , 2005, RECOMB.

[12]  Jeffrey D. Palmer,et al.  Widespread horizontal transfer of mitochondrial genes in flowering plants , 2003, Nature.

[13]  J. Lake,et al.  Horizontal gene transfer accelerates genome innovation and evolution. , 2003, Molecular biology and evolution.

[14]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[15]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[16]  W. Doolittle,et al.  How big is the iceberg of which organellar genes in nuclear genomes are but the tip? , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17]  Tandy J. Warnow,et al.  Phylogenetic networks: modeling, reconstructibility, and accuracy , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Vladimir Makarenkov,et al.  Phylogenetic Network Construction Approaches , 2006 .

[19]  John M. Mellor-Crummey,et al.  Reconstructing phylogenetic networks using maximum parsimony , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[20]  Reuven Bar-Yehuda,et al.  One for the Price of Two: a Unified Approach for Approximating Covering Problems , 1998, Algorithmica.

[21]  S HochbaDorit Approximation Algorithms for NP-Hard Problems , 1997 .

[22]  Jerrold I. Davis,et al.  Phylogenetic relationships among Poaceae and related families as inferred from morphology, inversions in the plastid genome, and sequence data from the mitochondrial and plastid genomes. , 2003, American journal of botany.

[23]  Wing-Kin Sung,et al.  Constructing a Smallest Refining Galled Phylogenetic Network , 2005, RECOMB.

[24]  J. Palmer,et al.  Rampant horizontal transfer and duplication of rubisco genes in eubacteria and plastids. , 1996, Molecular biology and evolution.

[25]  Michael T. Hallett,et al.  Simultaneous identification of duplications and lateral transfers , 2004, RECOMB.

[26]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[27]  Piotr Berman,et al.  A 2-Approximation Algorithm for the Undirected Feedback Vertex Set Problem , 1999, SIAM J. Discret. Math..

[28]  Hervé Philippe,et al.  Archaeal phylogeny based on ribosomal proteins. , 2002, Molecular biology and evolution.

[29]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[30]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[31]  Bernard M. E. Moret,et al.  Network ( Reticulate ) Evolution : Biology , Models , and Algorithms , 2004 .

[32]  T. Tuller,et al.  Inferring phylogenetic networks by the maximum parsimony criterion: a case study. , 2006, Molecular biology and evolution.

[33]  Bernard M. E. Moret,et al.  Network (Reticulated) Evolution: Biology, Models, and Algorithms , 2004 .

[34]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[35]  Sagi Snir,et al.  Efficient parsimony-based methods for phylogenetic network reconstruction , 2007, Bioinform..

[36]  Dan Gusfield,et al.  A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters , 2005, RECOMB.