On Exploring Genome Rearrangement Phylogenetic Patterns

The study of genome rearrangement is much harder than the corresponding problems on DNA and protein sequences, because of the occurrences of numerous combinatorial structures. By explicitly exploring these combinatorial structures, the recently developed adequate subgraph theory shows that a family of these structures, adequate subgraphs, are informative in finding the optimal solutions to the rearrangement median problem. Its extension gives rise to the tree scoring method GASTS, which provides quick and accurate estimation of the number of rearrangement events, for any given topology. With a similar motivation, this paper discusses and provides solid but somewhat initial results, on combinatorial structures that are informative in phylogenetic inference. These structures, called rearrangement phylogenetic patterns, provide more insights than algorithmic approaches, and may provide statistical significance for inferred phylogenies and lead to efficient and robust phylogenetic inference methods on large sets of taxa. We explore rearrangement phylogenetic patterns with respect to both the breakpoint distance and the DCJ distance. The latter has a simple formulation and well approximates other edit distances. On four genomes, we prove that a contrasting shared adjacency, where a gene forms one adjacency in two genomes and a different adjacency in the other two genomes, is a rearrangement phylogenetic pattern. Phylogenetic inferences based on the numbers of this pattern, are very accurate and robust against short internal edges, tested on 55,000 datasets simulated by random inversions. Further analysis shows that the numbers of this pattern well explain the variations in the number of rearrangement events over different topologies.

[1]  Tao Liu,et al.  Inversion Medians Outperform Breakpoint Medians in Phylogeny Reconstruction from Gene-Order Data , 2002, WABI.

[2]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[3]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[4]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[5]  David Sankoff,et al.  Multichromosomal Genome Median and Halving Problems , 2008, WABI.

[6]  Alberto Caprara The Reversal Median Problem , 2003, INFORMS J. Comput..

[7]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[8]  Pavel A. Pevzner,et al.  Transforming men into mice (polynomial algorithm for genomic distance problem) , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[10]  David Sankoff,et al.  Phylogenetic Invariants for Genome Rearrangements , 1999, J. Comput. Biol..

[11]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[12]  P. Pevzner,et al.  Breakpoint graphs and ancestral genome reconstructions. , 2009, Genome research.

[13]  Andrew Wei Xu,et al.  DCJ Median Problems on Linear Multichromosomal Genomes: Graph Representation and Fast Exact Solutions , 2009, RECOMB-CG.

[14]  David Bryant,et al.  The complexity of the breakpoint median problem , 1998 .

[15]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[16]  David Sankoff,et al.  Towards Improved Reconstruction of Ancestral Gene Order in Angiosperm Phylogeny , 2009, J. Comput. Biol..

[17]  David Sankoff,et al.  Decompositions of Multiple Breakpoint Graphs and Rapid Exact Solutions to the Median Problem , 2008, WABI.

[18]  P. Pevzner,et al.  Dynamics of Mammalian Chromosome Evolution Inferred from Multispecies Comparative Maps , 2005, Science.

[19]  Bernd Neumann,et al.  Computer Vision — ECCV’98 , 1998, Lecture Notes in Computer Science.