Prokaryote clustering based on DNA curvature distributions

Massive determination of complete genome sequences has led to the development of different tools for genome comparisons. Our approach is to compare genomes according to typical genomic distributions of a mathematical function that reflects a certain biological function. In this study we used comprehensive genome analysis of DNA curvature distributions in coding and non-coding regions of prokaryotic genomes to evaluate the assistance of mathematical and statistical procedures. Due to an extensive amount of data we were able to define the factors influencing the curvature distribution in promoter and terminator regions such as growth temperature, genome size, and A+T composition. Two clustering methods, K-means and PAM, were applied and produced very similar clusterings that reflect genomic attributes and environmental conditions of the species' habitat.

[1]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .

[2]  J. Griffith,et al.  Visualization of the bent helix in kinetoplast DNA by electron microscopy , 1986, Cell.

[3]  Hanspeter Herzel,et al.  10-11 bp periodicities in complete genomes reflect protein structure and DNA folding , 1999, Bioinform..

[4]  A. Bolshoy,et al.  Involvement of DNA curvature in intergenic regions of prokaryotes , 2006, Nucleic acids research.

[5]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[6]  R E Harrington,et al.  Curved DNA without A-A: experimental estimation of all 16 DNA wedge angles. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[8]  R. Wells,et al.  Unusual DNA Structures , 2011, Springer New York.

[9]  Hen-Ming Wu,et al.  The locus of sequence-directed and protein-induced DNA bending , 1984, Nature.

[10]  Peter F. Hallin,et al.  Genome update: Length distributions of sequenced prokaryotic genomes. , 2004, Microbiology.

[11]  E. Trifonov,et al.  Inherently curved DNA and its structural elements , 1988 .

[12]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[13]  D M Crothers,et al.  Bent helical structure in kinetoplast DNA. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Zeev Volkovich,et al.  On prokaryotes' clustering based on curvature distribution , 2006, Advances in Web Intelligence and Data Mining.

[15]  S Brunak,et al.  A DNA structural atlas for Escherichia coli. , 2000, Journal of molecular biology.

[16]  Alexander Bolshoy,et al.  Curvature distribution in prokaryotic genomes , 2004, Silico Biol..

[17]  R. Jáuregui,et al.  Genome analysis of Escherichia coli promoter sequences evidences that DNA static curvature plays a more important role in gene transcription than has previously been anticipated. , 2006, Genomics.

[18]  S. Andersson,et al.  Evolution of minimal-gene-sets in host-dependent bacteria. , 2004, Trends in microbiology.

[19]  E. Yeramian,et al.  Evolution of proteomes: fundamental signatures and global trends in amino acid compositions , 2006, BMC Genomics.

[20]  E. Nevo,et al.  Ecologic genomics of DNA: upstream bending in prokaryotic promoters. , 2000, Genome research.

[21]  Peter F. Hallin,et al.  Genome update: promoter profiles. , 2004, Microbiology.

[22]  A. Bolshoy,et al.  New Elements of the Termination of Transcription in Prokaryotes , 2004, Journal of biomolecular structure & dynamics.

[23]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[24]  Marc Teboulle,et al.  Clustering with Entropy-Like k-Means Algorithms , 2006, Grouping Multidimensional Data.

[25]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[26]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[27]  Peter F. Hallin,et al.  Genome update: chromosome atlases. , 2004, Microbiology.

[28]  Peter F. Hallin,et al.  Genome update: correlation of bacterial genomic properties. , 2004, Microbiology.

[29]  J. Parkhill,et al.  Comparative genomic structure of prokaryotes. , 2004, Annual review of genetics.

[30]  J. Wang,et al.  On the sequence determinants and flexibility of the kinetoplast DNA fragment with abnormal gel electrophoretic mobilities. , 1985, Journal of molecular biology.

[31]  V. de Lorenzo,et al.  Promoters responsive to DNA bending: a common theme in prokaryotic gene expression. , 1994, Microbiological reviews.

[32]  Edward N. Trifonov,et al.  CURVATURE: software for the analysis of curved DNA , 1993, Comput. Appl. Biosci..

[33]  V. Zhurkin,et al.  Periodicity in DNA primary structure is defined by secondary structure of the coded protein. , 1981, Nucleic acids research.

[34]  E N Trifonov,et al.  Curved DNA: design, synthesis, and circularization. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Rolf Wagner,et al.  Transcription Regulation in Prokaryotes , 2000 .

[36]  J. Collado-Vides,et al.  Conservation of DNA curvature signals in regulatory regions of prokaryotic genes. , 2003, Nucleic acids research.

[37]  J. Lobry,et al.  Synonymous codon usage and its potential link with optimal growth temperature in prokaryotes. , 2006, Gene.