Multifractal Characterisation of Length Sequences of Coding and Noncoding Segments in a Complete Genome

The coding and noncoding length sequences constructed from a complete genome are characterised by multifractal analysis. The dimension spectrum Dq and its derivative, the ‘analogous’ specific heat Cq, are calculated for the coding and noncoding length sequences of bacteria, where q is the moment order of the partition sum of the sequences. From the shape of the Dq and Cq curves, it is seen that there exists a clear difference between the coding/noncoding length sequences of all organisms considered and a completely random sequence. The complexity of noncoding length sequences is higher than that of coding length sequences for bacteria. Almost all Dq curves for coding length sequences are flat, so their multifractality is small whereas almost all Dq curves for noncoding length sequences are multifractal-like. It is seen that the ‘analogous’ specific heats of noncoding length sequences of bacteria have a rich variety of behaviour which is much more complex than that of coding length sequences. We propose to characterise the bacteria according to the types of the Cq curves of their noncoding length sequences. This new type of classification allows a better understanding of the relationship among bacteria at the global gene level instead of nucleotide sequence level.

[1]  Jensen,et al.  Order parameter, symmetry breaking, and phase transitions in the description of multifractal sets. , 1987, Physical review. A, General physics.

[2]  A K Mohanty,et al.  Factorial moments analyses show a characteristic length scale in DNA sequences. , 2000, Physical review letters.

[3]  Richard F. Voss,et al.  LONG-RANGE FRACTAL CORRELATIONS IN DNA INTRONS AND EXONS , 1994 .

[4]  Maria de Sousa Vieira,et al.  Statistics of DNA sequences: a low-frequency analysis. , 1999, cond-mat/9905074.

[5]  B. Wang,et al.  Correlation property of length sequences based on global structure of the complete genome. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Zu-Guo Yu,et al.  Rescaled range and transition matrix analysis of DNA sequences , 1999 .

[7]  Enrique Canessa,et al.  MULTIFRACTALITY IN TIME SERIES , 2000, cond-mat/0004170.

[8]  Stanley,et al.  Phase transition in the multifractal spectrum of diffusion-limited aggregation. , 1988, Physical review letters.

[9]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[10]  C. A. Chatzidimitriou-Dreismann,et al.  Long-range correlations in DNA , 1993, Nature.

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Itamar Procaccia,et al.  Phase transitions in the thermodynamic formalism of multifractals. , 1987 .

[13]  R. Mantegna,et al.  Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[14]  A. Harris Genes VI , 1997 .

[15]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[16]  R. Pastor-Satorras Multifractal properties of power-law time sequences: Application to rice piles , 1997, cond-mat/9709079.

[17]  K. Lau,et al.  Measure representation and multifractal analysis of complete genomes. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Jensen,et al.  Erratum: Fractal measures and their singularities: The characterization of strange sets , 1986, Physical review. A, General physics.

[19]  Berthelsen,et al.  Effective multifractal spectrum of a random walk. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[20]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[21]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[22]  Wentian Li,et al.  Understanding long-range correlations in DNA sequences , 1994, chao-dyn/9403002.

[23]  Yannis Almirantis,et al.  FRACTAL CANTOR PATTERNS IN THE SEQUENCE STRUCTURE OF DNA , 2000 .

[24]  Quang M. Tieng,et al.  Cointegration of stochastic multifractals with application to foreign exchange rates , 2000 .

[25]  Zu-Guo Yu,et al.  Time Series Model Based on Global Structure of Complete Genome , 2001 .

[26]  Liaofu Luo,et al.  STATISTICAL CORRELATION OF NUCLEOTIDES IN A DNA SEQUENCE , 1998 .

[27]  B. Hao,et al.  Fractals related to long DNA sequences and complete genomes , 2000 .

[28]  Zu-Guo Yu,et al.  Dimensions of fractals related to languages defined by tagged strings in complete genomes , 1999, physics/9910040.

[29]  V. V. Prabhu,et al.  Correlations in intronless DNA , 1992, Nature.

[30]  Zu-Guo Yu,et al.  Factorizable language: from dynamics to bacterial complete genomes , 2000 .

[31]  P. Grassberger,et al.  Characterization of Strange Attractors , 1983 .

[32]  Bin Wang,et al.  A time series model of CDS sequences in complete genome , 2000, physics/0006080.