Towards Phylogenomic Reconstruction

Reconstructing phylogenies is one of the primary objectives in evolution studies. Efficient software to reconstruct phylogenies based on isolated genes has existed for decades, yet, phylogenetic reconstructions from whole genomes are only beginning. The diversification of genome sequencing projects has generated thousands of whole genomes making phylogenomic reconstruction a challenging research topic. In this paper, we present an approach for pairwise alignment construction which deploys both nucleotide and locus (a segment of nucleotides) operations to minimize the total edit cost between genomes. The cost is composed of three factors: nucleotide transformation costs between loci, indel costs of loci, and rearrangement costs between locus orders. This approach is embedded within a direct optimization scheme to reconstruct phylogenies from whole unaligned genomes. Performance of this approach is demonstrated in our software, POY4, to reconstruct phylogenies from Coronavirus and Poxvirus genomes.

[1]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[2]  K. Crandall,et al.  Recombination in evolutionary genomics. , 2002, Annual review of genetics.

[3]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[4]  W. Maddison Gene Trees in Species Trees , 1997 .

[5]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[6]  Tandy J. Warnow,et al.  Steps toward accurate reconstructions of phylogenies from gene-order data , 2002, J. Comput. Syst. Sci..

[7]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[8]  W. Wheeler Chromosomal character optimization. , 2007, Molecular phylogenetics and evolution.

[9]  Jonathan H. Epstein,et al.  Bats Are Natural Reservoirs of SARS-Like Coronaviruses , 2005, Science.

[10]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[11]  S. Weiss,et al.  SARS: lessons learned from other coronaviruses. , 2003, Viral immunology.

[12]  M. Lai Coronavirus: organization, replication and expression of genome. , 1990, Annual review of microbiology.

[13]  W. Wheeler OPTIMIZATION ALIGNMENT: THE END OF MULTIPLE SEQUENCE ALIGNMENT IN PHYLOGENETICS? , 1996 .

[14]  J. A. Comer,et al.  A novel coronavirus associated with severe acute respiratory syndrome. , 2003, The New England journal of medicine.

[15]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[16]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[17]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[18]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[19]  Michael Brudno,et al.  Fast and sensitive multiple alignment of large genomic sequences , 2003, BMC Bioinformatics.

[20]  Hyrum Carroll Dynamic homology and phylogenetic systematics: a unified approach using POY , 2008 .

[21]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[22]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[23]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[24]  Le Sy Vinh,et al.  Pairwise alignment with rearrangements. , 2006, Genome informatics. International Conference on Genome Informatics.

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.