An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage

BackgroundEukaryotic whole genome sequences are accumulating at an impressive rate. Effective methods for comparing multiple whole eukaryotic genomes on a large scale are needed. Most attempted solutions involve the production of large scale alignments, and many of these require a high stringency pre-screen for putative orthologs in order to reduce the effective size of the dataset and provide a reasonably high but unknown fraction of correctly aligned homologous sites for comparison. As an alternative, highly efficient methods that do not require the pre-alignment of operationally defined orthologs are also being explored.ResultsA non-alignment method based on the Singular Value Decomposition (SVD) was used to compare the predicted protein complement of nine whole eukaryotic genomes ranging from yeast to man. This analysis resulted in the simultaneous identification and definition of a large number of well conserved motifs and gene families, and produced a species tree supporting one of two conflicting hypotheses of metazoan relationships.ConclusionsOur SVD-based analysis of the entire protein complement of nine whole eukaryotic genomes suggests that highly conserved motifs and gene families can be identified and effectively compared in a single coherent definition space for the easy extraction of gene and species trees. While this occurs without the explicit definition of orthologs or homologous sites, the analysis can provide a basis for these definitions.

[1]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[2]  Hervé Philippe,et al.  Horizontal gene transfer and phylogenetics. , 2003, Current opinion in microbiology.

[3]  Michael W. Berry,et al.  A Comprehensive Whole Genome Bacterial Phylogeny Using Correlated Peptide Motifs Defined in a High Dimensional Vector Space , 2003, J. Bioinform. Comput. Biol..

[4]  N. Grishin,et al.  Genome trees and the tree of life. , 2002, Trends in genetics : TIG.

[5]  Orna Man,et al.  Proteomic signatures: Amino acid and oligopeptide compositions differentiate among phyla , 2003, Proteins.

[6]  G. Stuart,et al.  A whole genome perspective on the phylogeny of the plant virus family Tombusviridae , 2004, Archives of Virology.

[7]  N. Moran,et al.  Phylogenetics and the Cohesion of Bacterial Genomes , 2003, Science.

[8]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[9]  R. Raff,et al.  Evidence for a clade of nematodes, arthropods and other moulting animals , 1997, Nature.

[10]  B. Blaisdell A measure of the similarity of sets of sequences not requiring sequence alignment. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[12]  George Savva,et al.  Current Approaches to Whole Genome Phylogenetic Analysis , 2003, Briefings Bioinform..

[13]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[14]  Anton J. Enright,et al.  Protein families and TRIBES in genome sequence space. , 2003, Nucleic acids research.

[15]  N. Moran,et al.  From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-Proteobacteria , 2003, PLoS biology.

[16]  S. Fitz-Gibbon,et al.  Using Homolog Groups to Create a Whole-Genomic Tree of Free-Living Organisms: An Update , 2002, Journal of Molecular Evolution.

[17]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[18]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[19]  Jason Raymond,et al.  Evolution of photosynthetic prokaryotes: a maximum-likelihood mapping approach. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[20]  Jodie J. Yin,et al.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes , 2004, Genome Biology.

[21]  Zu-Guo Yu,et al.  Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. , 2003, Molecular biology and evolution.

[22]  J. Leader,et al.  A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. , 2002, Molecular biology and evolution.

[23]  M. Gerstein,et al.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. , 2000, Genome research.

[24]  J. Mallatt,et al.  Testing the new animal phylogeny: first use of combined large-subunit and small-subunit rRNA gene sequences to classify the protostomes. , 2002, Molecular biology and evolution.

[25]  C. Cinti,et al.  Ras family genes: An interesting link between cell cycle and cancer , 2002, Journal of cellular physiology.

[26]  T. Gojobori,et al.  Bmc Evolutionary Biology the Evolutionary Position of Nematodes , 2022 .

[27]  E. Koonin,et al.  Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. , 2003, Genome research.

[28]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.