Rearrangement Phylogeny of Genomes in Contig Form

There has been a trend in increasing the phylogenetic scope of genome sequencing while decreasing the quality of the published sequence for each genome. With reduced finishing effort, there is an increasing number of genomes being published in contig form. Rearrangement algorithms, including gene order-based phylogenetic tools, require whole genome data on gene order, segment order, or some other marker order. Items whose chromosomal location is unknown cannot be part of the input. The question we address here is, for gene order-based phylogenetic analysis, how can we use rearrangement algorithms to handle genomes available in contig form only? Our suggestion is to use the contigs directly in the rearrangement algorithms as if they were chromosomes, while making a number of corrections, e.g., we correct for the number of extra fusion/fission operations required to make contigs comparable to full assemblies. We model the relationship between contig number and genomic distance, and estimate the parameters of this model using insect genome data. With this model, we use distance matrix methods to reconstruct the phylogeny based on genomic distance and numbers of contigs. We compare this with methods to reconstruct ancestral gene orders using uncorrected contig data.

[1]  D. Severson,et al.  Comparative genome analysis of the yellow fever mosquito Aedes aegypti with Drosophila melanogaster and the malaria vector mosquito Anopheles gambiae. , 2004, The Journal of heredity.

[2]  Arjun Bhutkar,et al.  Inferring genome-scale rearrangement phylogeny and ancestral gene order: a Drosophila case study , 2007, Genome Biology.

[3]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[4]  David Sankoff,et al.  Internal Validation of Ancestral Gene Order Reconstruction in Angiosperm Phylogeny , 2008, RECOMB-CG.

[5]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[6]  Axel Hultman,et al.  Estimating the expected reversal distance after a fixed number of reversals , 2004, Adv. Appl. Math..

[7]  G. Weinstock,et al.  Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects. , 2006, Genome research.

[8]  Glenn Tesler,et al.  Efficient algorithms for multichromosomal genome rearrangements , 2002, J. Comput. Syst. Sci..

[9]  Temple F. Smith,et al.  Techniques for multi-genome synteny analysis to overcome assembly limitations. , 2006, Genome informatics. International Conference on Genome Informatics.

[10]  Tandy J. Warnow,et al.  Distance-Based Genome Rearrangement Phylogeny , 2006, Journal of Molecular Evolution.

[11]  Arjun Bhutkar,et al.  Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes , 2008, Genetics.

[12]  Martin J. Lercher,et al.  the base of the radiation of Holometabolous insects Phylogenomic analysis reveals bees and wasps (Hymenoptera) at , 2006 .

[13]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[14]  Niklas Eriksen,et al.  Expected Gene Order Distances and Model Selection in Bacteria , 2008, German Conference on Bioinformatics.

[15]  Mathieu Blanchette,et al.  Ordering Partially Assembled Genomes Using Gene Arrangements , 2006, Comparative Genomics.

[16]  Chunfang Zheng,et al.  Pathgroups, a dynamic data structure for genome reconstruction problems , 2010, Bioinform..

[17]  J. Krzywinski,et al.  Analysis of the complete mitochondrial DNA from Anopheles funestus: an improved dipteran mitochondrial genome annotation and a temporal dimension of mosquito evolution. , 2006, Molecular phylogenetics and evolution.