Steps toward accurate reconstructions of phylogenies from gene-order data

We report on our progress in reconstructing phylogenies from gene-order data. We have developed polynomial-time methods for estimating genomic distances that greatly improve the accuracy of trees obtained using the popular neighbor-joining method; we have also further improved the running time of our GRAPPA software suite through a combination of tighter bounding and better use of the bounds. We present new experimental results (that extend those we presented at ISMB'01 and WABI'01) that demonstrate the accuracy and robustness of our distance estimators under a wide range of model conditions. Moreover, using the best of our distance estimators (EDE) in our GRAPPA software suite, along with more sophisticated bounding techniques, produced spectacular improvements in the already huge speedup: whereas our earlier experiments showed a one-million-fold speedup (when run on a 512-processor cluster), our latest experiments demonstrate a speedup of one hundred million. The combination of these various advances enabled us to conduct new phylogenetic analyses of a subset of the Campanulaceae family, confirming various conjectures about the relationships among members of the subset and confirming that inversion can be viewed as the principal mechanism of evolution for their chloroplast genome. We give representative results of the extensive experimentation we conducted on both real and simulated datasets in order to validate and characterize our approaches.

[1]  Bernard M. E. Moret,et al.  An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae , 2000 .

[2]  David Bryant,et al.  The complexity of the breakpoint median problem , 1998 .

[3]  David Bryant,et al.  A lower bound for the breakpoint phylogeny problem , 2000, J. Discrete Algorithms.

[4]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[5]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, WADS.

[6]  Jeffrey D. Palmer,et al.  Use of Chloroplast DNA Rearrangements in Reconstructing Plant Phylogeny , 1992 .

[7]  R. A. Groeneveld,et al.  Practical Nonparametric Statistics (2nd ed). , 1981 .

[8]  C. Borror Practical Nonparametric Statistics, 3rd Ed. , 2001 .

[9]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[10]  Vineet Bafna,et al.  Sorting Permutations by Transpositions , 1995, SODA.

[11]  David A. Bader,et al.  GRAPPA runs in record time , 2000 .

[12]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[13]  Toshihisa Takagi,et al.  Genome Informatics 1997 , 1997 .

[14]  David A. Bader,et al.  High-Performance Algorithm Engineering for Computational Phylogenetics , 2001, The Journal of Supercomputing.

[15]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[16]  David Sankoff,et al.  Probability models for genome rearrangement and linear invariants for phylogenetic inference , 1999, RECOMB.

[17]  Tandy J. Warnow,et al.  Estimating true evolutionary distances between genomes , 2001, STOC '01.

[18]  Alberto Caprara,et al.  Experimental and Statistical Analysis of Sorting by Reversals , 2000 .

[19]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Bernard M. E. Moret,et al.  New approaches for reconstructing phylogenies based on gene order , 2001 .

[21]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[22]  Tandy J. Warnow,et al.  New approaches for reconstructing phylogenies from gene order data , 2001, ISMB.

[23]  Tandy J. Warnow,et al.  A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data , 2000, ISMB.

[24]  David A. Bader,et al.  A New Implmentation and Detailed Study of Breakpoint Analysis , 2000, Pacific Symposium on Biocomputing.

[25]  J. Palmer,et al.  Chloroplast DNA systematics: a review of methods and data analysis , 1994 .

[26]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[27]  Richard M. Karp,et al.  The Traveling-Salesman Problem and Minimum Spanning Trees , 1970, Oper. Res..

[28]  Li-San Wang,et al.  Exact-IEBP: A New Technique for Estimating Evolutionary Distances between Whole Genomes , 2001, WABI.

[29]  Vineet Bafna,et al.  Sorting permutations by tanspositions , 1995, SODA '95.

[30]  Linda A. Raubeson,et al.  Chloroplast DNA Evidence on the Ancient Evolutionary Split in Vascular Land Plants , 1992, Science.

[31]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[32]  Breakpoint Phylogenies. , 1997, Genome informatics. Workshop on Genome Informatics.

[33]  D. Sankoff,et al.  Gene Order Breakpoint Evidence in Animal Mitochondrial Phylogeny , 1999, Journal of Molecular Evolution.

[34]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[35]  Douglas E. Soltis,et al.  Plant Molecular Systematics , 1995 .