Use and misuse of correspondence analysis in codon usage studies.

Correspondence analysis has frequently been used for codon usage studies but this method is often misused. Because amino acid composition exerts constraints on codon usage, it is common to use tables containing relative codon frequencies (or ratios of frequencies) instead of simple codon counts to get rid of these amino acid biases. The problem is that some important properties of correspondence analysis, such as rows weighting, are lost in the process. Moreover, the use of relative measures sometimes introduces other biases and often diminishes the quantity of information to analyse, occasionally resulting in interpretation errors. For instance, in the case of an organism such as Borrelia burgdorferi, the use of relative measures led to the conclusion that there was no translational selection, while analyses based on codon counts show that there is a possibility of a selective effect at that level. In this paper, we expose these problems and we propose alternative strategies to correspondence analysis for studying codon usage biases when amino acid composition effects must be removed.

[1]  M. Hill Correspondence Analysis: A Neglected Multivariate Method , 1974 .

[2]  M. Gouy,et al.  Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. , 1980, Nucleic acids research.

[3]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. , 1981, Journal of molecular biology.

[4]  M. Gouy,et al.  Codon usage in bacteria: correlation with gene expressivity. , 1982, Nucleic acids research.

[5]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[6]  T. Ikemura Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. , 1982, Journal of molecular biology.

[7]  J. Bennetzen,et al.  Codon selection in yeast. , 1982, The Journal of biological chemistry.

[8]  L Holm,et al.  Codon usage and gene expression. , 1986, Nucleic acids research.

[9]  Paul M. Sharp,et al.  Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes , 1986, Nucleic Acids Res..

[10]  Christian Gautier,et al.  Statistical method for predicting protein coding regions in nucleic acid sequences , 1987, Comput. Appl. Biosci..

[11]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[12]  D C Shields,et al.  Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. , 1987, Nucleic acids research.

[13]  J. Hoch,et al.  Genetics and biotechnology of Bacilli , 1984, Gene.

[14]  Desmond G. Higgins,et al.  BACILLUS SUBTILIS GENE SEQUENCES , 1990 .

[15]  A. Danchin,et al.  Evidence for horizontal gene transfer in Escherichia coli speciation. , 1991, Journal of molecular biology.

[16]  G Perrière,et al.  NRSub: a non-redundant data base for the Bacillus subtilis genome. , 1994, Nucleic acids research.

[17]  C. Gautier,et al.  Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. , 1994, Nucleic acids research.

[18]  Daniel Chessel,et al.  A fuzzy coding approach for the analysis of long‐term ecological data , 1994 .

[19]  G. Fichant,et al.  A frameshift error detection algorithm for DNA sequencing projects. , 1995, Nucleic acids research.

[20]  A Danchin,et al.  SubtiList: a relational database for the Bacillus subtilis genome. , 1995, Microbiology.

[21]  Toshimichi Ikemura,et al.  Detection of genes in Escherichia coli sequences determined by genome projects and prediction of protein production levels, based on multivariate diversity in codon usage , 1996, Comput. Appl. Biosci..

[22]  Jean Thioulouse,et al.  ADE-4: a multivariate analysis and graphical display software , 1997, Stat. Comput..

[23]  J. McInerney Prokaryotic Genome Evolution as Assessed by Multivariate Analysis of Codon Usage Patterns , 1997 .

[24]  J O McInerney,et al.  Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  A Danchin,et al.  Codon usage and lateral gene transfer in Bacillus subtilis. , 1999, Current opinion in microbiology.

[26]  S. Kanaya,et al.  Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. , 1999, Gene.

[27]  P. Sharp,et al.  Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases. , 1999, Nucleic acids research.

[28]  G. Bernardi,et al.  Gene Expression, Amino Acid Conservation, and Hydrophobicity Are the Main Factors Shaping Codon Preferences in Mycobacterium tuberculosis and Mycobacterium leprae , 2000, Journal of Molecular Evolution.

[29]  C. Biémont,et al.  Codon usage and the origin of P elements. , 2000, Molecular biology and evolution.

[30]  P. Sharp,et al.  Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. , 2000, Microbiology.

[31]  Guy Perrière,et al.  EMGLib: the Enhanced Microbial Genomes Library (update 2000) , 2000, Nucleic Acids Res..

[32]  H. Romero,et al.  Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces. , 2000, Nucleic acids research.

[33]  Frederick R. Blattner,et al.  High-Density Microarray-Mediated Gene Expression Profiling of Escherichia coli , 2001, Journal of bacteriology.

[34]  T C Ghosh,et al.  Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa. , 2001, Gene.

[35]  Yuyu Kuang,et al.  Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics. , 2002, Nucleic acids research.