Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses.

Similar to the chaos game representation (CGR) of DNA sequences proposed by Jeffrey (Nucleic Acid Res. 18 (1990) 2163), a new CGR of protein sequences based on the detailed HP model is proposed. Multifractal and correlation analyses of the measures based on the CGR of protein sequences from complete genomes are performed. The Dq spectra of all organisms studied are multifractal-like and sufficiently smooth for the Cq curves to be meaningful. The Cq curves of bacteria resemble a classical phase transition at a critical point. The correlation distance of the difference between the measure based on the CGR of protein sequences and its fractal background is also proposed to construct a more precise phylogenetic tree of bacteria.

[1]  Zu-Guo Yu,et al.  Multifractal and correlation analyses of protein sequences from complete genomes. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Enrique Canessa,et al.  MULTIFRACTALITY IN TIME SERIES , 2000, cond-mat/0004170.

[3]  J. F. Gwan,et al.  The HP Model, Designability and Alpha-Helices in Protein Structures , 1998 .

[4]  K. Lau,et al.  Recognition of an organism from fragments of its complete genome. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Bin Wang,et al.  One way to characterize the compact structures of lattice protein model , 2000 .

[6]  H. G. E. Hentschel,et al.  The infinite number of generalized dimensions of fractals and strange attractors , 1983 .

[7]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[8]  Jensen,et al.  Fractal measures and their singularities: The characterization of strange sets. , 1987, Physical review. A, General physics.

[9]  E Pennisi,et al.  Genome Data Shake Tree of Life , 1998, Science.

[10]  Hue Sun Chan,et al.  Compact Polymers , 2001 .

[11]  H. Ochman,et al.  Molecular archaeology of the Escherichia coli genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Stanley,et al.  Phase transition in the multifractal spectrum of diffusion-limited aggregation. , 1988, Physical review letters.

[13]  Itamar Procaccia,et al.  Phase transitions in the thermodynamic formalism of multifractals. , 1987 .

[14]  K. Lau,et al.  Measure representation and multifractal analysis of complete genomes. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Zu-Guo Yu,et al.  Fractal Analysis of Measure Representation of Large Proteins Based on the Detailed HP Model , 2004 .

[16]  N. Wingreen,et al.  Emergence of Preferred Structures in a Simple Model of Protein Folding , 1996, Science.

[17]  Jensen,et al.  Erratum: Fractal measures and their singularities: The characterization of strange sets , 1986, Physical review. A, General physics.

[18]  S. Osawa,et al.  Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[19]  B. Hao,et al.  Fractals related to long DNA sequences and complete genomes , 2000 .

[20]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[21]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[22]  Zu-Guo Yu,et al.  Dimensions of fractals related to languages defined by tagged strings in complete genomes , 1999, physics/9910040.

[23]  S. Basu,et al.  Chaos game representation of proteins. , 1997, Journal of molecular graphics & modelling.

[24]  W Wang,et al.  Modeling study on the validity of a possibly simplified representation of proteins. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[25]  C. Woese The universal ancestor. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Flavio Seno,et al.  Steric Constraints in Model Proteins , 1998 .

[27]  C T Shih,et al.  Mean-field HP model, designability and alpha-helices in protein structures. , 2000, Physical review letters.

[28]  José Manuel Gutiérrez,et al.  Multifractal analysis of DNA sequences using a novel chaos-game representation , 2001 .

[29]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[30]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[31]  C T Shih,et al.  Geometric and statistical properties of the mean-field hydrophobic-polar model, the large-small model, and real protein sequences. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[33]  B. Hao,et al.  Avoided Strings in Bacterial Complete Genomes and a Related Combinatorial Problem , 2000 .

[34]  Jensen,et al.  Order parameter, symmetry breaking, and phase transitions in the description of multifractal sets. , 1987, Physical review. A, General physics.

[35]  W. Godwin Article in Press , 2000 .

[36]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[37]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[38]  A. Fiser,et al.  Chaos game representation of protein structures. , 1994, Journal of molecular graphics.

[39]  T. Vicsek,et al.  Determination of fractal dimensions for geometrical multifractals , 1989 .

[40]  E. Hill Journal of Theoretical Biology , 1961, Nature.

[41]  Zu-Guo Yu,et al.  Distance, correlation and mutual information among portraits of organisms based on complete genomes , 2001 .