Genomic signature: characterization and classification of species assessed by chaos game representation of sequences.

We explored DNA structures of genomes by means of a new tool derived from the "chaotic dynamical systems" theory (the so-called chaos game representation [CGR]), which allows the depiction of frequencies of oligonucleotides in the form of images. Using CGR, we observe that subsequences of a genome exhibit the main characteristics of the whole genome, attesting to the validity of the genomic signature concept. Base concentrations, stretches (runs of complementary bases or purines/pyrimidines), and patches (over- or underexpressed words of various lengths) are the main factors explaining the variability observed among sequences. The distance between images may be considered a measure of phylogenetic proximity. Eukaryotes and prokaryotes can be identified merely on the basis of their DNA structures.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  J. Josse,et al.  Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. , 1961, The Journal of biological chemistry.

[3]  S. Osawa,et al.  The guanine and cytosine content of genomic DNA and bacterial evolution. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[4]  R. Ivarie,et al.  Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. , 1987, Nucleic acids research.

[5]  J A Koziol,et al.  Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[6]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[7]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[8]  C Dutta,et al.  Mathematical characterization of Chaos Game Representation. New algorithms for nucleotide sequence analysis. , 1992, Journal of molecular biology.

[9]  S Karlin,et al.  Statistical analyses of counts and distributions of restriction sites in DNA sequences. , 1992, Nucleic acids research.

[10]  A. Bhagwat,et al.  DNA mismatch correction by Very Short Patch repair may have altered the abundance of oligonucleotides in the E. coli genome. , 1992, Nucleic acids research.

[11]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  S Karlin,et al.  Significant dispersed recurrent DNA sequences in the Escherichia coli genome. Several new groups. , 1993, Journal of molecular biology.

[13]  N. Goldman,et al.  Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. , 1993, Nucleic acids research.

[14]  J. Oliver,et al.  Entropic profiles of DNA sequences through chaos-game-derived images. , 1993, Journal of theoretical biology.

[15]  S Karlin,et al.  Comparisons of eukaryotic genomic sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Sharp,et al.  Codon usage and genome evolution. , 1994, Current opinion in genetics & development.

[17]  A J Bleasby,et al.  Singular over-representation of an octameric palindrome, HIP1, in DNA from many cyanobacteria. , 1995, Nucleic acids research.

[18]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[19]  D. Forsdyke,et al.  Different biological species "broadcast" their DNAs at different (G+C)% "wavelengths". , 1996, Journal of theoretical biology.

[20]  E V Koonin,et al.  Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. , 1997, Nucleic acids research.

[21]  S Karlin,et al.  Compositional biases of bacterial genomes and evolutionary implications , 1997, Journal of bacteriology.

[22]  C. Marshall,et al.  The Coming of Age of Molecular Systematics , 1998, Science.

[23]  Russell F. Doolittle,et al.  Microbial genomes opened up , 1998, Nature.

[24]  Radhey S. Gupta What are archaebacteria: life's third domain or monoderm prokaryotes related to Gram‐positive bacteria? A new proposal for the classification of prokaryotic organisms , 1998, Molecular microbiology.

[25]  R. Gupta Life's third domain (Archaea): an established fact or an endangered paradigm? , 1998, Theoretical population biology.