Maximum likelihood phylogenetic reconstruction using gene order encodings

Gene order changes under rearrangement events such as inversions and transpositions have attracted increasing attention as a new type of data for phylogenetic analysis. Since these events are rare, they allow the reconstruction of evolutionary history far back in time. Many software have been developed for the inference of gene order phylogenies, including widely used maximum parsimony methods such as GRAPPA and MGR. However, these methods confronted great difficulties in dealing with emerging large nuclear genomes. In this study, we proposed three simple yet powerful maximum likelihood(ML) based methods for phylogenetic reconstruction by first encoding the gene orders into binary or multistate strings based on gene adjacency information presented in the given genomes and further converting these strings into molecular sequences. RAxML is at last used to compute the maximum likelihood phylogeny. We conducted extensive experiments using simulated datasets and found that although the multistate encoding is more complex and more time-consuming, it did not improve accuracy over the methods using simpler binary encodings. Among all methods tested in our experiments, MLBE is of the most accuracy in most cases and often returns phylogenies without errors. ML methods is also fast and in the most difficult case only takes up to three days to compute datasets with 40 genomes, making it very suitable for large scale analysis. We give three simple and robust phylogenetic reconstruction methods using different encodings based on maximum likelihood which has not been successfully applied for gene orderings before. Our development of these ML methods showed great potential in gene order analysis with respect to the high accuracy and stability, although formal mathematical and statistical analysis of these methods are much desired.

[1]  Tandy J. Warnow,et al.  A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data , 2000, ISMB.

[2]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[3]  David Bryant,et al.  A lower bound for the breakpoint phylogeny problem , 2000, J. Discrete Algorithms.

[4]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[5]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[6]  Tandy J. Warnow,et al.  Distance-Based Genome Rearrangement Phylogeny , 2006, Journal of Molecular Evolution.

[7]  Yu Lin,et al.  Estimating true evolutionary distances under the DCJ model , 2008, ISMB.

[8]  Matthias Bernt,et al.  The Reversal Median Problem, Common Intervals, and Mitochondrial Gene Orders , 2006, CompLife.

[9]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..

[10]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[11]  P. Holland,et al.  Rare genomic changes as a tool for phylogenetics. , 2000, Trends in ecology & evolution.

[12]  M D Sorenson,et al.  Multiple independent transpositions of mitochondrial DNA control region sequences to the nucleus. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Rougemont,et al.  A rapid bootstrap algorithm for the RAxML Web servers. , 2008, Systematic biology.

[14]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, WABI.

[15]  David A. Bader,et al.  A detailed study of breakpoint analysis , 2001 .

[16]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[17]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[18]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[19]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[20]  Bret Larget,et al.  A Bayesian approach to the estimation of ancestral genome arrangements. , 2005, Molecular phylogenetics and evolution.

[21]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[22]  Tandy J. Warnow,et al.  New approaches for reconstructing phylogenies from gene order data , 2001, ISMB.

[23]  Bernard M. E. Moret,et al.  New approaches for reconstructing phylogenies based on gene order , 2001 .

[24]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..