Phylogenetic Reconstruction from Gene-Rearrangement Data with Unequal Gene Content

Phylogenetic reconstruction from gene-rearrangement data has seen increased attention over the last five years. Existing methods are limited computationally and by the assumption (highly unrealistic in practice) that all genomes have the same gene content. We have recently shown that we can scale our reconstruction tool, GRAPPA, to instances with up to a thousand genomes with no loss of accuracy and at minimal computational cost. Computing genomic distances between two genomes with unequal gene contents has seen much progress recently, but that progress has not yet been reflected in phylogenetic reconstruction methods. In this paper, we present extensions to our GRAPPA approach that can handle limited numbers of duplications (one of the main requirements for analyzing genomic data from organelles) and a few deletions. Although GRAPPA is based on exhaustive search, we show that, in practice, our bounding functions suffice to prune away almost all of the search space (our pruning rates never fall below 99.995%), resulting in high accuracy and fast running times. The range of values within which we have tested our approach encompasses mitochondria and chloroplast organellar genomes, whose phylogenetic analysis is providing new insights on evolution.

[1]  Tao Liu,et al.  Inversion Medians Outperform Breakpoint Medians in Phylogeny Reconstruction from Gene-Order Data , 2002, WABI.

[2]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[3]  Krister M. Swenson,et al.  Genomic Distances under Deletions and Insertions , 2003, COCOON.

[4]  J. Palmer,et al.  Chloroplast DNA systematics: a review of methods and data analysis , 1994 .

[5]  Nadia El-Mabrouk,et al.  Genome Rearrangement by Reversals and Insertions/Deletions of Contiguous Segments , 2000, CPM.

[6]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[7]  Pavel A. Pevzner,et al.  Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals , 1999, J. ACM.

[8]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[9]  Bernard M. E. Moret,et al.  Finding an Optimal Inversion Median: Experimental Results , 2001, WABI.

[10]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[11]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[12]  David Bryant,et al.  A lower bound for the breakpoint phylogeny problem , 2000, J. Discrete Algorithms.

[13]  J. Palmer,et al.  Comparison of Chloroplast and Mitochondrial Genome Evolution in Plants , 1992 .

[14]  Alberto Caprara,et al.  On the Practical Solution of the Reversal Median Problem , 2001, WABI.

[15]  Jijun Tang,et al.  Scaling up accurate phylogenetic reconstruction from gene-order data , 2003, ISMB.

[16]  Linda A. Raubeson,et al.  Chloroplast DNA Evidence on the Ancient Evolutionary Split in Vascular Land Plants , 1992, Science.

[17]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[18]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[19]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, WADS.

[20]  Daniel H. Huson,et al.  Hybrid tree reconstruction methods , 1999, JEAL.

[21]  Douglas E. Soltis,et al.  Plant Molecular Systematics , 1995 .

[22]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[23]  Jeffrey D. Palmer,et al.  Use of Chloroplast DNA Rearrangements in Reconstructing Plant Phylogeny , 1992 .

[24]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[25]  Bernard M. E. Moret,et al.  An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae , 2000 .

[26]  Tandy J. Warnow,et al.  Steps toward accurate reconstructions of phylogenies from gene-order data , 2002, J. Comput. Syst. Sci..

[27]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[28]  Daniel H. Huson,et al.  Solving Large Scale Phylogenetic Problems using DCM2 , 1999, ISMB.

[29]  Bret Larget,et al.  A Markov chain Monte Carlo approach to reconstructing ancestral genome arrangements , 2002 .

[30]  Alberto Caprara,et al.  Sorting by reversals is difficult , 1997, RECOMB '97.