Genomic signature is preserved in short DNA fragments

The "chaos game representation" (CGR) paradigm has been implemented to display the use of short oligonucleotides in genomes in the form of fractal images. These images can be considered as a genomic signature. Using an unsupervised classification approach, it is shown that short fragments of genomic sequences retain most of the characteristics of the species they come from. It thus appears possible to perform a global comparison of species by means of genome fragments found in databases. The efficiency of this approach is evaluated as a function of the size of the fragments and the length of the oligonucleotides.

[1]  S. Karlin,et al.  Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. , 1996, Nucleic acids research.

[2]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[3]  J. Josse,et al.  Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. , 1961, The Journal of biological chemistry.

[4]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[5]  Brian Charlesworth,et al.  Genetic Recombination: Patterns in the genome , 1994, Current Biology.

[6]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[7]  B. Blaisdell A measure of the similarity of sets of sequences not requiring sequence alignment. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[8]  A J Bleasby,et al.  Singular over-representation of an octameric palindrome, HIP1, in DNA from many cyanobacteria. , 1995, Nucleic acids research.

[9]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  C. Marshall,et al.  The Coming of Age of Molecular Systematics , 1998, Science.

[11]  S Karlin,et al.  Compositional biases of bacterial genomes and evolutionary implications , 1997, Journal of bacteriology.

[12]  Samir K. Brahmachari,et al.  Genome analysis: A new approach for visualization of sequence organization in genomes , 1992, Journal of Biosciences.

[13]  S. Karlin,et al.  Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J A Koziol,et al.  Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[15]  E V Koonin,et al.  Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. , 1997, Nucleic acids research.

[16]  Ernest,et al.  Enzymatic synthesis of deoxyribonucleic acid. , 1969, Harvey lectures.

[17]  M. Gouy,et al.  Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. , 1980, Nucleic acids research.

[18]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[19]  R. Ivarie,et al.  Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. , 1987, Nucleic acids research.

[20]  C Dutta,et al.  Mathematical characterization of Chaos Game Representation. New algorithms for nucleotide sequence analysis. , 1992, Journal of molecular biology.