Revisiting the CAI from a whole-genome perspective : analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models

Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. Both indices are based on fairly simple assumptions about which genes are most highly expressed, which were known when they were first derived: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage, on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genome-wide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome-wide expression datasets improves the performance of both numerical indices. Indeed, we find that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is at

[1]  M. Gerstein,et al.  Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. , 2002, Genes & development.

[2]  Mark Gerstein,et al.  Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts , 2002, Bioinform..

[3]  S. Karlin,et al.  Characterizations of Highly Expressed Genes of Four Fast-Growing Bacteria , 2001, Journal of bacteriology.

[4]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[5]  K. H. Wolfe,et al.  Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae , 2000, Yeast.

[6]  S. Karlin,et al.  Predicted Highly Expressed Genes of Diverse Prokaryotic Genomes , 2000, Journal of bacteriology.

[7]  S. Gygi,et al.  Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M Gerstein,et al.  Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. , 2000, Nucleic acids research.

[9]  B. Futcher,et al.  A Sampling of the Yeast Proteome , 1999, Molecular and Cellular Biology.

[10]  Angelo Pavesi,et al.  Relationships Between Transcriptional and Translational Control of Gene Expression in Saccharomyces cerevisiae: A Multiple Regression Analysis , 1999, Journal of Molecular Evolution.

[11]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[12]  L. Samson,et al.  Global response of Saccharomyces cerevisiae to an alkylating agent. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Michael R. Green,et al.  Dissecting the Regulatory Circuitry of a Eukaryotic Genome , 1998, Cell.

[14]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[15]  S Karlin,et al.  Codon usages in different gene classes of the Escherichia coli genome , 1998, Molecular microbiology.

[16]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[17]  P. Sharp,et al.  Codon usage and genome evolution. , 1994, Current opinion in genetics & development.

[18]  F. Kianifard Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design , 1992 .

[19]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[20]  A. Konopka,et al.  Is the information content of DNA evolutionarily significant? , 1984, Journal of theoretical biology.

[21]  J. Bennetzen,et al.  Codon selection in yeast. , 1982, The Journal of biological chemistry.

[22]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. , 1981, Journal of molecular biology.

[23]  Nicola J. Rinaldi,et al.  Supporting online material for : Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002 .

[24]  Kei-Hoi Cheung,et al.  An integrated approach for finding overlooked genes in yeast , 2002, Nature Biotechnology.

[25]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.