Reconstructing phylogenies from gene-content and gene-order data

Gene-order data have been used successfully to reconstruct organellar phylogenies; they offer low error rates, the potential to reach farther back in time than through DNA sequences (because genome-level events are rarer than DNA point mutations), and immunity from the so-called gene-tree vs. species-tree problem (caused by the fact that the evolutionary history of specific genes is not isomorphic to that of the organism as a whole). They have also provided deep mathematical and algorithmic results dealing with permutations and shortest sequences of operations on these permutations. Recent developments include generalizations to handle insertions, duplications, and deletions, scaling to large numbers of organisms, and, to a lesser extent, to larger genomes; and the first Bayesian approach to the reconstruction problem. We survey the state-of-the-art in using such data for phylogenetic reconstruction, focusing on recent work by our group that has enabled us to handle arbitrary insertions, duplications, and deletions of genes, as well as inversions of gene subsequences. We conclude with a list of research questions (mathematical, algorithmic, and biological) that will need to be addressed in order to realize the full potential of this type of data. 12.

[1]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[3]  Tandy J. Warnow,et al.  Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining , 2001, SODA '01.

[4]  Timothy M. Collins,et al.  Deducing the pattern of arthropod phytogeny from mitochondrial DNA rearrangements , 1995, Nature.

[5]  Tandy J. Warnow,et al.  Estimating true evolutionary distances between genomes , 2001, STOC '01.

[6]  Tzvika Hartman,et al.  A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions , 2003, CPM.

[7]  Vineet Bafna,et al.  Sorting by Transpositions , 1998, SIAM J. Discret. Math..

[8]  Alberto Caprara,et al.  On the Practical Solution of the Reversal Median Problem , 2001, WABI.

[9]  J. Kadane,et al.  Bayesian phylogenetic inference from animal mitochondrial genome arrangements , 2002 .

[10]  S. Heard,et al.  PATTERNS IN PHYLOGENETIC TREE BALANCE WITH VARIABLE AND EVOLVING SPECIATION RATES , 1996, Evolution; international journal of organic evolution.

[11]  David A. Bader,et al.  An exact linear-time algorithm for computing genomic distances under inversions and deletions U , 2003 .

[12]  Henry D. Shapiro,et al.  Algorithms and Experiments: The New (and Old) Methodology , 2001, J. Univers. Comput. Sci..

[13]  Arne Ø. Mooers,et al.  Inferring Evolutionary Process from Phylogenetic Tree Shape , 1997, The Quarterly Review of Biology.

[14]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[15]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[16]  A. Halpern,et al.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. , 2000, Molecular biology and evolution.

[17]  P. Pevzner,et al.  Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[19]  Bernard M. E. Moret,et al.  Finding an Optimal Inversion Median: Experimental Results , 2001, WABI.

[20]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[21]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[22]  Tandy J. Warnow,et al.  New approaches for reconstructing phylogenies from gene order data , 2001, ISMB.

[23]  P. Holland,et al.  Rare genomic changes as a tool for phylogenetics. , 2000, Trends in ecology & evolution.

[24]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[25]  Tao Liu,et al.  Inversion Medians Outperform Breakpoint Medians in Phylogeny Reconstruction from Gene-Order Data , 2002, WABI.

[26]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[27]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[28]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[29]  Brandon S. Gaut,et al.  Extensive gene gain associated with adaptive evolution of poxviruses , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[31]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[32]  Daniel H. Huson,et al.  Solving Large Scale Phylogenetic Problems using DCM2 , 1999, ISMB.

[33]  Bernard M. E. Moret,et al.  New Software for Computational Phylogenetics , 2002 .

[34]  Tandy J. Warnow,et al.  Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees , 2004, IEEE Computer Society Computational Systems Bioinformatics Conference.

[35]  J. Huelsenbeck,et al.  MRBAYES : Bayesian inference of phylogeny , 2001 .

[36]  David Sankoff,et al.  Short inversions and conserved gene cluster , 2002, Bioinform..

[37]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[38]  Krister M. Swenson,et al.  Genomic Distances under Deletions and Insertions , 2003, COCOON.

[39]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[40]  David Sankoff,et al.  Conserved Synteny As a Measure of Genomic Distance , 1996, Discret. Appl. Math..

[41]  Bernard M. E. Moret,et al.  A new fast heuristic for computing the breakpoint phylogeny and a phylogenetic analysis of a group of highly rearranged chloroplast genomes , 2000, ISMB 2000.

[42]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[43]  Nadia El-Mabrouk,et al.  Genome Rearrangement by Reversals and Insertions/Deletions of Contiguous Segments , 2000, CPM.

[44]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[45]  Bernard M. E. Moret,et al.  Fast Phylogenetic Methods For Genome Rearrangement Evolution: An Empirical Study , 2002 .

[46]  Tandy J. Warnow,et al.  Designing fast converging phylogenetic methods , 2001, ISMB.

[47]  Jijun Tang,et al.  Scaling up accurate phylogenetic reconstruction from gene-order data , 2003, ISMB.

[48]  Tandy J. Warnow,et al.  Steps toward accurate reconstructions of phylogenies from gene-order data , 2002, J. Comput. Syst. Sci..

[49]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2003, COCOON.

[50]  Jijun Tang,et al.  Phylogenetic Reconstruction from Gene-Rearrangement Data with Unequal Gene Content , 2003, WADS.

[51]  Bernard M. E. Moret,et al.  Reversing Gene Erosion - Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data , 2004, WABI.

[52]  David A. Bader,et al.  High-Performance Algorithm Engineering for Computational Phylogenetics , 2001, The Journal of Supercomputing.

[53]  D. Aldous Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today , 2001 .

[54]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[55]  David Sankoff,et al.  Conserved segment identification , 1997, RECOMB '97.

[56]  Mike Steel,et al.  The Maximum Likelihood Point for a Phylogenetic Tree is Not Unique , 1994 .

[57]  Sudhir Kumar,et al.  MEGA2: molecular evolutionary genetics analysis software , 2001, Bioinform..

[58]  Jijun Tang,et al.  Phylogenetic reconstruction from arbitrary gene-order data , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[59]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, WADS.

[60]  W. Maddison A METHOD FOR TESTING THE CORRELATED EVOLUTION OF TWO BINARY CHARACTERS: ARE GAINS OR LOSSES CONCENTRATED ON CERTAIN BRANCHES OF A PHYLOGENETIC TREE? , 1990, Evolution; international journal of organic evolution.

[61]  Bernard M. E. Moret,et al.  Rec-I-DCM3: a fast algorithmic technique for reconstructing phylogenetic trees , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[62]  David A. Bader,et al.  A New Implmentation and Detailed Study of Breakpoint Analysis , 2000, Pacific Symposium on Biocomputing.

[63]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[64]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[65]  Krister M. Swenson,et al.  Approximating the true evolutionary distance between two genomes , 2008, JEAL.

[66]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[67]  Michael P. Cummings,et al.  MEGA (Molecular Evolutionary Genetics Analysis) , 2004 .

[68]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[69]  Bernard M. E. Moret,et al.  DIMACS Series in Discrete Mathematics and Theoretical Computer Science Towards a Discipline of Experimental Algorithmics , 2022 .

[70]  J. Boore,et al.  Big trees from little genomes: mitochondrial gene order as a phylogenetic tool. , 1998, Current opinion in genetics & development.

[71]  Roded Sharan,et al.  A 1.5-approximation algorithm for sorting by transpositions and transreversals , 2004, J. Comput. Syst. Sci..

[72]  J. Palmer,et al.  A chloroplast DNA inversion marks an ancient evolutionary split in the sunflower family (Asteraceae). , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Bernard M. E. Moret,et al.  An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae , 2000 .

[74]  Tandy J. Warnow,et al.  Reconstructing Optimal Phylogenetic Trees: A Challenge in Experimental Algorithmics , 2000, Experimental Algorithmics.

[75]  David Sankoff,et al.  Detection and validation of single gene inversions , 2003, ISMB.

[76]  Glenn Tesler,et al.  Efficient algorithms for multichromosomal genome rearrangements , 2002, J. Comput. Syst. Sci..