Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models.

Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were first introduced, they were based on fairly simple assumptions about which genes are most highly expressed: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genome-wide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome-wide expression data sets improves the performance of both numerical indices. Indeed, we find that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is available at http://bioinfo.mbb.yale.edu/expression/codons.

[1]  M Gerstein,et al.  Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. , 2000, Nucleic acids research.

[2]  Mark Gerstein,et al.  Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts , 2002, Bioinform..

[3]  F. Kianifard Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design , 1992 .

[4]  M. Gerstein,et al.  Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. , 2002, Genes & development.

[5]  K. H. Wolfe,et al.  Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae , 2000, Yeast.

[6]  L. Samson,et al.  Global response of Saccharomyces cerevisiae to an alkylating agent. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[8]  S. Karlin,et al.  Predicted Highly Expressed Genes of Diverse Prokaryotic Genomes , 2000, Journal of bacteriology.

[9]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[10]  Angelo Pavesi,et al.  Relationships Between Transcriptional and Translational Control of Gene Expression in Saccharomyces cerevisiae: A Multiple Regression Analysis , 1999, Journal of Molecular Evolution.

[11]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[12]  Michael R. Green,et al.  Dissecting the Regulatory Circuitry of a Eukaryotic Genome , 1998, Cell.

[13]  Kei-Hoi Cheung,et al.  An integrated approach for finding overlooked genes in yeast , 2002, Nature Biotechnology.

[14]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[15]  S. Gygi,et al.  Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Sharp,et al.  Codon usage and genome evolution. , 1994, Current opinion in genetics & development.

[17]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[18]  Albert-László Barabási,et al.  Life's Complexity Pyramid , 2002, Science.

[19]  J. Bennetzen,et al.  Codon selection in yeast. , 1982, The Journal of biological chemistry.

[20]  S Karlin,et al.  Codon usages in different gene classes of the Escherichia coli genome , 1998, Molecular microbiology.

[21]  J. Jobson Applied Multivariate Data Analysis , 1995 .

[22]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[23]  S. Karlin,et al.  Characterizations of Highly Expressed Genes of Four Fast-Growing Bacteria , 2001, Journal of bacteriology.

[24]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.

[25]  B. Futcher,et al.  A Sampling of the Yeast Proteome , 1999, Molecular and Cellular Biology.

[26]  G. J. G. Upton,et al.  Applied Multivariate Data Analysis, Volume 1: Regression and Experimental Design , 1994, The Mathematical Gazette.

[27]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. , 1981, Journal of molecular biology.

[28]  A. Konopka,et al.  Is the information content of DNA evolutionarily significant? , 1984, Journal of theoretical biology.