Improving Tree Search in Phylogenetic Reconstruction from Genome Rearrangement Data

A major task in evolutionary biology is to determine the ancestral relationships among the known species, a process generally referred as phylogenetic reconstruction. In the past decade, a new type of data based on genome rearrangements has attracted increasing attention from both biologists and computer scientists. Methods for reconstructing phylogeny based on genome rearrangement data include distance-based methods, direct optimization methods (GRAPPA and MGR), and Markov Chain Monte Carlo (MCMC) methods (Badger). Extensive testing on simulated and biological datasets showed that the latter three methods are currently the best methods for genome rearrangement phylogeny. However, all these tools are dealing with extremely large searching spaces; the total number of possible trees grows exponentially when the number of genomes increases and makes it computationally very expensive. Various heuristics are used to explore the tree space but with no guarantee of optimum being found. In this paper, we present a new method to efficiently search the large tree space. This method is motivated by the concept of particle filtration (also known as Sequential Monte Carlo), which was originally proposed to boost the efficiency of MCMC methods on massive data. We tested and compared this new method on simulated datasets in different scenarios. The results show that the new method achieves a significant improvement in efficiency, while still retains very high topological accuracy.

[1]  Jiming Liu,et al.  Autonomous Intelligent Systems: Agents and Data Mining: International Workshop, AIS-ADM 2005, St. Petersburg, Russia, June 6-8, 2005. Proceedings , 2005, AIS-ADM.

[2]  J. Palmer,et al.  Chloroplast DNA systematics: a review of methods and data analysis , 1994 .

[3]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[4]  Tandy J. Warnow,et al.  Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study , 2001, Pacific Symposium on Biocomputing.

[5]  Bret Larget,et al.  A bayesian analysis of metazoan mitochondrial genome arrangements. , 2005, Molecular biology and evolution.

[6]  Tandy J. Warnow,et al.  Steps toward accurate reconstructions of phylogenies from gene-order data , 2002, J. Comput. Syst. Sci..

[7]  Jijun Tang,et al.  Reconstructing phylogenies from gene-content and gene-order data , 2007, Mathematics of Evolution and Phylogeny.

[8]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[9]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[10]  David A. Bader,et al.  A New Implmentation and Detailed Study of Breakpoint Analysis , 2000, Pacific Symposium on Biocomputing.

[11]  Jeffrey D. Palmer,et al.  Use of Chloroplast DNA Rearrangements in Reconstructing Plant Phylogeny , 1992 .

[12]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[13]  Elchanan Mossel,et al.  Limitations of Markov chain Monte Carlo algorithms for Bayesian inference of phylogeny , 2005, The Annals of Applied Probability.

[14]  Jijun Tang,et al.  Linear Programming for Phylogenetic Reconstruction Based on Gene Rearrangements , 2005, CPM.

[15]  Olivier Gascuel,et al.  Mathematics of Evolution and Phylogeny , 2005 .

[16]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[17]  David Madigan,et al.  A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets , 2003, Data Mining and Knowledge Discovery.

[18]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[19]  N. Chopin A sequential particle filter method for static models , 2002 .

[20]  Axel Meyer,et al.  Molecular systematics (2nd edn): edited by David M. Hillis, Craig Moritz and Barbara K. Mable Sinauer Associates, 1996. $49.95 pbk (655 pages) ISBN 0 87893 282 8 , 1996 .

[21]  Linda A. Raubeson,et al.  Chloroplast DNA Evidence on the Ancient Evolutionary Split in Vascular Land Plants , 1992, Science.

[22]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[23]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[24]  Tao Liu,et al.  Quartet-Based Phylogeny Reconstruction from Gene Orders , 2005, COCOON.

[25]  Jijun Tang,et al.  Scaling up accurate phylogenetic reconstruction from gene-order data , 2003, ISMB.

[26]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[27]  David Sankoff,et al.  The Median Problem for Breakpoints in Comparative Genomics , 1997, COCOON.

[28]  Linda A. Raubeson,et al.  Chloroplast DNA rearrangements in Campanulaceae: phylogenetic utility of highly rearranged genomes , 2004, BMC Evolutionary Biology.

[29]  J. Palmer,et al.  Comparison of Chloroplast and Mitochondrial Genome Evolution in Plants , 1992 .

[30]  Alberto Caprara,et al.  On the Practical Solution of the Reversal Median Problem , 2001, WABI.

[31]  Jijun Tang,et al.  Large-scale phylogenetic reconstruction from arbitrary gene-order data , 2004 .

[32]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[33]  Jijun Tang,et al.  Quartet methods for phylogeny reconstruction from gene orders , 2005 .

[34]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[35]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[36]  P. Erdös,et al.  Local Quartet Splits of a Binary Tree Infer All Quartet Splits Via One Dyadic Inference Rule , 1996, Comput. Artif. Intell..

[37]  Bernard M. E. Moret,et al.  Fast Phylogenetic Methods For Genome Rearrangement Evolution: An Empirical Study , 2002 .

[38]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .