Using Permutation Patterns for Content-Based Phylogeny

When the same set of genes appear in different orders on the chromosomes, they form a permutation pattern. Permutation patterns have been used to identify potential haplogroups in mammalian data [8]. They also have been successfully used to detect phylogenetic relationships between computer viruses [9]. In this paper we explore the use of these patterns as a content similarity measure and use this in inferring phylogenies from genome rearrangement data in polynomial time. The method uses a function of the cardinality of the set of common maximal permutation patterns as a proxy for evolutionary “proximity” between genomes. We introduce Pi-logen, a phylogeny tool based on this method. We summarize results of feasibility study for this scheme on synthetic data by (1) content verification and (2) ancestor prediction. We also successfully infer phylogenies on series of synthetic data and on chloroplast gene order of Campanulaceae data.

[1]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[2]  David Sankoff,et al.  Edit Distances for Genome Comparisons Based on Non-Local Operations , 1992, CPM.

[3]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[4]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[6]  Bernard M. E. Moret,et al.  An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae , 2000 .

[7]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[8]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[9]  Piotr Berman,et al.  Fast Sorting by Reversal , 1996, CPM.

[10]  Haim Kaplan,et al.  Faster and simpler algorithm for sorting signed permutations by reversals , 1997, SODA '97.

[11]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[12]  Gad M. Landau,et al.  A Combinatorial Approach to Automatic Discovery of Cluster-Patterns , 2003, WABI.

[13]  David Sankoff,et al.  Efficient Bounds for Oriented Chromosome Inversion Distance , 1994, CPM.

[14]  Gad M. Landau,et al.  Using PQ Trees for Comparative Genomics , 2005, CPM.

[15]  W. Ewens,et al.  The chromosome inversion problem , 1982 .

[16]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[17]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.