Fractal and Dynamical Language Methods to Construct Phylogenetic Tree Based on Protein Sequences from Complete Genomes

The complete genomes of living organisms have provided much information on their phylogenetic relationships. In the past few years, we proposed three alternative methods to model the noise background in the composition vector of protein sequences from a complete genome. The first method is based on the frequencies of the 20 kinds of amino acids appearing in the genome and the multiplicative model. The second method is based on the iterated function system model in fractal geometry. The last method is based on the relationship between a word and its two sub-words in the theory of symbolic dynamics. Here we introduce these methods. The complete genomes of prokaryotes and eukaryotes are selected to test these algorithms. Our distance-based phylogenetic tree of prokaryotes and eukaryotes agrees with the biologists' “tree of life” based on the 16S-like rRNA genes in a majority of basic branches and most lower taxa.

[1]  Zu-Guo Yu,et al.  Multifractal and correlation analyses of protein sequences from complete genomes. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  S. Osawa,et al.  Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Dubuc,et al.  Fractal Geometry and Analysis , 1991 .

[4]  D. Sankoff,et al.  Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Zu-Guo Yu,et al.  Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. , 2004, Journal of theoretical biology.

[6]  H Herzel,et al.  Information content of protein sequences. , 2000, Journal of theoretical biology.

[7]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[8]  B. Dujon,et al.  The genomic tree as revealed from whole proteome comparisons. , 1999, Genome research.

[9]  Zu-Guo Yu,et al.  Distance, correlation and mutual information among portraits of organisms based on complete genomes , 2001 .

[10]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[11]  Radhey S. Gupta Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes , 1998, Microbiology and Molecular Biology Reviews.

[12]  Herrmann,et al.  Gene transfer from organelles to the nucleus: how much, what happens, and Why? , 1998, Plant Physiology.

[13]  S. Fitz-Gibbon,et al.  Whole genome-based phylogenetic analysis of free-living microorganisms. , 1999, Nucleic acids research.

[14]  Shlomo Nir,et al.  NATO ASI Series , 1995 .

[15]  E. Mayr Two empires or three? , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Zu-Guo Yu,et al.  The genomic tree of living organisms based on a fractal model , 2003 .

[17]  Hong Luo,et al.  CVTree: a phylogenetic tree reconstruction tool based on whole genomes , 2004, Nucleic Acids Res..

[18]  K. Lau,et al.  Recognition of an organism from fragments of its complete genome. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[20]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[21]  Steve Baker,et al.  Integrated gene and species phylogenies from unaligned whole genome protein sequences , 2002, Bioinform..

[22]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[23]  C. Woese The universal ancestor. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  M. Ragan Detection of lateral gene transfer among microbial genomes. , 2001, Current opinion in genetics & development.

[25]  Zu-Guo Yu,et al.  Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. , 2003, Molecular biology and evolution.

[26]  J. Leader,et al.  A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. , 2002, Molecular biology and evolution.

[27]  M. Gerstein,et al.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. , 2000, Genome research.

[28]  K. Chu,et al.  Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment , 2005, Journal of Molecular Evolution.

[29]  C. Fraser,et al.  Phylogenomics: Intersection of Evolution and Genomics , 2003, Science.

[30]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[31]  Mark A. Ragan,et al.  Microbial phylogenomics: Branching out , 2003, Nature.

[32]  Zu-Guo Yu,et al.  Phylogenetic Tree of Prokaryotes Based on the Complete Genomes using Fractal and Correlation Analyses , 2004, APBC.

[33]  James R. Brown,et al.  Archaea and the prokaryote-to-eukaryote transition. , 1997, Microbiology and molecular biology reviews : MMBR.

[34]  Russell F. Doolittle,et al.  Microbial genomes opened up , 1998, Nature.