Analysis of codon usage patterns of bacterial genomes using the self-organizing map.

Codon usage varies both between organisms and between different genes in the same organism. This observation has been used as a basis for earlier work in identifying highly expressed and horizontally transferred genes in Escherichia coli. In this work, we applied Kohonen's self-organizing map to analysis of the codon usage pattern of the Escherichia coli, Aquifex aeolicus, Archaeoglobus fulgidus, Haemophilus influenzae RD:, Methanococcus jannaschii, Methanobacterium thermoautotrophicum, and Pyrococcus horikoshii genomes for evidence of highly expressed genes and horizontally transferred genes. All of the analyzed genomes had a clear category of horizontally transferred genes, and their apparent percentages ranged from 7.7% to 21.4%. The apparent percentage of highly expressed genes ranges from 0% to 11.8%. A clustering of average codon usage of main gene categories of the seven genomes showed an interesting mixing of gene classes in four thermophilic/hyperthermophilic organisms, A. aeolicus, A. fulgidus, M. thermoautotrophicum, and P. horikoshii, which suggests possible origins of their horizontally transferred genes as well as the need for adaptation to a specific environment. Further classification of the three gene categories in E. coli and H. influenzae according to gene function revealed that genes involved in communication (such as regulation and cell process) and structure (cell structure and structural proteins) are more likely to be horizontally transferred than are genes involved in information (transcription, translation, and related processes) and in some groups of energy (such as energy metabolism and carbon compound catabolism).

[1]  T Gojobori,et al.  Codon usage tabulated from the international DNA sequence databases; its status 1999 , 1999, Nucleic Acids Res..

[2]  R. Huber,et al.  The complete genome of the hyperthermophilic bacterium Aquifex aeolicus , 1998, Nature.

[3]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.

[4]  H. Ochman,et al.  Amelioration of Bacterial Genomes: Rates of Change and Exchange , 1997, Journal of Molecular Evolution.

[5]  F. Robb,et al.  Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. , 1998, DNA research : an international journal for rapid publication of reports on genes and genomes.

[6]  M. Van Montagu,et al.  Classification of Arabidopsis thaliana gene sequences: clustering of coding sequences into two groups according to codon usage improves gene prediction. , 1999, Journal of molecular biology.

[7]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[8]  G. Church,et al.  Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics , 1997, Journal of bacteriology.

[9]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[10]  S Karlin,et al.  Codon usages in different gene classes of the Escherichia coli genome , 1998, Molecular microbiology.

[11]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. , 1981, Journal of molecular biology.

[12]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[14]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[15]  R. Fleischmann,et al.  The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus , 1997, Nature.

[16]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[17]  T. Ikemura Codon usage and tRNA content in unicellular and multicellular organisms. , 1985, Molecular biology and evolution.

[18]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[19]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[20]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[21]  S Karlin,et al.  Detecting Alien Genes in Bacterial Genomes a , 1999, Annals of the New York Academy of Sciences.

[22]  A. Danchin,et al.  Evidence for horizontal gene transfer in Escherichia coli speciation. , 1991, Journal of molecular biology.

[23]  C Ouzounis,et al.  Genomes with distinct function composition , 1996, FEBS letters.

[24]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[25]  F. Wright The 'effective number of codons' used in a gene. , 1990, Gene.

[26]  H. Ochman,et al.  Molecular archaeology of the Escherichia coli genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  M. Riley,et al.  Functions of the gene products of Escherichia coli , 1993, Microbiological reviews.

[28]  M. Gouy,et al.  Codon catalog usage and the genome hypothesis. , 1980, Nucleic acids research.

[29]  J. Badger Exploration of Microbial Genomic Sequences via Comparative Analysis , 1999 .

[30]  P. Sharp,et al.  Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons. , 1986, Nucleic acids research.

[31]  J M Carazo,et al.  Pattern recognition and classification of images of biological macromolecules using artificial neural networks. , 1994, Biophysical journal.