Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance

A new and essentially simple method to reconstruct prokaryotic phylogenetic trees from their complete genome data without using sequence alignment is proposed. It is based on the appearance frequency of oligopeptides of a fixed length (up to K=6) in their proteomes. This is a method without fine adjustment and choice of genes. It can incorporate the effect of lateral gene transfer to some extent and leads to results comparable with the bacteriologists' systematics as reflected in the latest 2001 edition of the Sergey's manual of systematic bacteriology. A key point in our approach is subtraction of a random back-groundby using a Markovian model of order K-1 from the composition vectors to highlight the shaping role of natural selection.

[1]  R. E. Buchanan,et al.  Bergey's Manual of Determinative Bacteriology. , 1975 .

[2]  B. Hao,et al.  Fractals related to long DNA sequences and complete genomes , 2000 .

[3]  B. Hao,et al.  Fractals from genomes – exact solutions of a biology-inspired problem , 1999, cond-mat/9910422.

[4]  M. Ragan Detection of lateral gene transfer among microbial genomes. , 2001, Current opinion in genetics & development.

[5]  Bailin Hao,et al.  Molecular phylogeny of coronaviruses including human SARS-CoV , 2003, Chinese science bulletin = Kexue tongbao.

[6]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[7]  M. Bravo Bergey's Manual of Determinative Bacteriology , 1926, The Indian Medical Gazette.

[8]  Bailin Hao,et al.  Compositional representation of protein sequences and the number of Eulerian loops , 2001 .

[9]  Bin Wang,et al.  Statistically significant strings are related to regulatory elements in the promoter regions of Saccharomyces cerevisiae , 2000, physics/0009002.

[10]  J. Beckmann,et al.  Linguistics of nucleotide sequences: morphology and comparison of vocabularies. , 1986, Journal of biomolecular structure & dynamics.

[11]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[12]  Peer Bork,et al.  Lateral Gene Transfer, Genome Surveys, and the Phylogeny of Prokaryotes , 1999 .

[13]  C. Woese Interpreting the universal phylogenetic tree. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[15]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[16]  B. Dujon,et al.  The genomic tree as revealed from whole proteome comparisons. , 1999, Genome research.

[17]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[18]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[19]  C. Woese,et al.  Methanopyrus kandleri: an archaeal methanogen unrelated to all other known methanogens. , 1991, Systematic and applied microbiology.

[20]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[21]  Bailin Hao,et al.  Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. , 2004, Journal of bioinformatics and computational biology.

[22]  Gary J. Olsen,et al.  Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process , 2000, Microbiology and Molecular Biology Reviews.

[23]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[24]  Zu-Guo Yu,et al.  Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. , 2003, Molecular biology and evolution.

[25]  C. Woese The universal ancestor. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  R. Doolittle,et al.  Evolutionary anomalies among the aminoacyl-tRNA synthetases. , 1998, Current opinion in genetics & development.

[27]  N. Grishin,et al.  Genome trees constructed using five different approaches suggest new major bacterial clades , 2001, BMC Evolutionary Biology.

[28]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[29]  Z. Xuan,et al.  Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis , 2002, Journal of biological physics.