A Median Solver and Phylogenetic Inference Based on Double-Cut-and-Join Sorting

Genome rearrangement is known as one of the main evolutionary mechanisms on the genomic level. Phylogenetic analysis based on rearrangement played a crucial role in biological research in the past decades, especially with the increasing availability of fully sequenced genomes. In general, phylogenetic analysis aims to solve two problems: small parsimony problem (SPP) and big parsimony problem (BPP). Maximum parsimony is a popular approach for SPP and BPP, which relies on iteratively solving an NP-hard problem, the median problem. As a result, current median solvers and phylogenetic inference methods based on the median problem all face serious problems on scalability and cannot be applied to data sets with large and distant genomes. In this article, we propose a new median solver for gene order data that combines double-cut-and-join sorting with the simulated annealing algorithm. Based on this median solver, we built a new phylogenetic inference method to solve both SPP and BPP problems. Our experimental results show that the new median solver achieves an excellent performance on simulated data sets, and the phylogenetic inference tool built based on the new median solver has a better performance than other existing methods.

[1]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[2]  Jun Zhou,et al.  Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Alberto Caprara,et al.  On the Practical Solution of the Reversal Median Problem , 2001, WABI.

[4]  Yu Lin,et al.  MLGO: phylogeny reconstruction and ancestral inference from gene-order data , 2014, BMC Bioinformatics.

[5]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[6]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[7]  Yu Lin,et al.  Fast and Accurate Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Novel Robustness Estimator , 2010, RECOMB-CG.

[8]  Pedro Feijão,et al.  Reconstruction of ancestral gene orders using intermediate genomes , 2015, BMC Bioinformatics.

[9]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[10]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[11]  Toni Gabaldón,et al.  Beyond the Whole-Genome Duplication: Phylogenetic Evidence for an Ancient Interspecies Hybridization in the Baker's Yeast Lineage , 2015, PLoS biology.

[12]  Scott Kirkpatrick,et al.  Optimization by simulated annealing: Quantitative studies , 1984 .

[13]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[14]  Bernard M. E. Moret,et al.  An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes , 2015, J. Comput. Biol..

[15]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[16]  Jijun Tang,et al.  Ancestral Genome Inference Using a Genetic Algorithm Approach , 2013, PloS one.