Codon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences

Three results are presented. First, we prove the existence of a universal 7-cluster structure in all 143 completely sequenced bacterial genomes available in Genbank in August 2004, and explained its properties. The 7-cluster structure is responsible for the main part of sequence heterogeneity in bacterial genomes. In this sense, our 7 clusters is the basic model of bacterial genome sequence. We demonstrated that there are four basic “pure” types of this model, observed in nature: “parallel triangles”, “perpendicular triangles”, degenerated case and the flower-like type.

[1]  E. Trifonov Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. , 1987, Journal of molecular biology.

[2]  C. Zhang,et al.  A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences. , 1994, Journal of molecular biology.

[3]  Andreï Yu VISUALIZING THE SPATIAL STRUCTURE OF TRIPLET DISTRIBUTIONS IN GENETIC TEXTS , 2002 .

[4]  Mark Hoebeke,et al.  Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. , 2002, Nucleic acids research.

[5]  James O. McInerney,et al.  Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models , 2004, BMC Bioinformatics.

[6]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[7]  A N Gorban',et al.  [A new approach to the study of statistical properties of genetic sequences]. , 1993, Biofizika.

[8]  N. Sueoka Two Aspects of DNA Base Composition: G+C Content and Translation-Coupled Deviation from Intra-Strand Rule of A=T and G=C , 1999, Journal of Molecular Evolution.

[9]  J. Lobry,et al.  Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. , 1999, Gene.

[10]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[11]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[12]  Alessandra Carbone,et al.  Codon adaptation index as a measure of dominating codon bias , 2003, Bioinform..

[13]  R Zhang,et al.  Analysis of distribution of bases in the coding sequences by a diagrammatic technique. , 1991, Nucleic acids research.

[14]  S. Karlin,et al.  Strand compositional asymmetry in bacterial and large viral genomes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  L. Frappat,et al.  Conspiracy in bacterial genomes , 2005, q-bio/0507030.

[16]  Alexander N. Gorban,et al.  Seven clusters in genomic triplet distributions , 2003, Silico Biol..

[17]  Michael G. Sadovsky,et al.  Maximum Entropy Method in Analysis of Genetic Text and Measurement of its Information Content , 1998 .

[18]  J Chakrabarti,et al.  Coding DNA sequences: statistical distributions. , 2003, Mathematical biosciences.

[19]  J. Lobry,et al.  Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. , 1997, Gene.

[20]  P. Sharp,et al.  Codon usage: mutational bias, translational selection, or both? , 1993, Biochemical Society transactions.

[21]  M. Bulmer The selection-mutation-drift theory of synonymous codon usage. , 1991, Genetics.

[22]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[23]  Dong Xu,et al.  Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes , 2004, BMC Evolutionary Biology.

[24]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[25]  Alexander N. Gorban,et al.  Self-Organizing Approach for Automated Gene Identification , 2003, Open Syst. Inf. Dyn..

[26]  D. Lynn,et al.  Synonymous codon usage is subject to selection in thermophilic bacteria. , 2002, Nucleic acids research.

[27]  Marco Archetti,et al.  Codon Usage Bias and Mutation Constraints Reduce the Level of ErrorMinimization of the Genetic Code , 2004, Journal of Molecular Evolution.

[28]  Pierre Baldi On the convergence of a clustering algorithm for protein-coding regions in microbial genomes , 2000, Bioinform..

[29]  J. Lobry Asymmetric substitution patterns in the two DNA strands of bacteria. , 1996, Molecular biology and evolution.

[30]  D. Chessel,et al.  Internal correspondence analysis of codon and amino-acid usage in thermophilic bacteria. , 2003, Journal of applied genetics.

[31]  S. Cebrat,et al.  DNA asymmetry and the replicational mutational pressure. , 2001, Journal of applied genetics.

[32]  S Audic,et al.  Self-identification of protein-coding regions in microbial genomes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[33]  S. Karlin,et al.  Global dinucleotide signatures and analysis of genomic heterogeneity. , 1998, Current opinion in microbiology.

[34]  Feng-Biao Guo,et al.  Analysis of nucleotide distribution in the genome of Streptomyces coelicolor A3(2) using the Z curve method , 2003, FEBS letters.

[35]  Michael G. Sadovsky,et al.  Classification of Symbol Sequences over Their Frequency Dictionaries: Towards the Connection between Structure and Natural Taxonomy , 2000 .

[36]  N. Sueoka Directional mutation pressure and neutral molecular evolution. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Noboru Sueoka,et al.  Intrastrand parity rules of DNA base composition and usage biases of synonymous codons , 2005, Journal of Molecular Evolution.

[38]  Alison K. Hottes,et al.  Codon usage between genomes is constrained by genome-wide mutational processes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  W. Godwin Article in Press , 2000 .

[40]  Donald C. Wunsch,et al.  Application of the method of elastic maps in analysis of genetic texts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[41]  Alexander N Gorban,et al.  Four basic symmetry types in the universal 7-clusterstructure of 143 complete bacterial genomic sequences , 2004, q-bio/0410033.

[42]  N. Sueoka On the genetic basis of variation and heterogeneity of DNA base composition. , 1962, Proceedings of the National Academy of Sciences of the United States of America.

[43]  N. Sueoka,et al.  Asymmetric directional mutation pressures in bacteria , 2002, Genome Biology.

[44]  Alexander N. Gorban,et al.  Visualization of Data by Method of Elastic Maps and Its Applications in Genomics, Economics and Sociology , 2001 .

[45]  J. R. Lobry,et al.  Properties of a general model of DNA evolution under no-strand-bias conditions , 1995, Journal of Molecular Evolution.

[46]  H E Stanley,et al.  Finding borders between coding and noncoding DNA regions by an entropic segmentation method. , 2000, Physical review letters.