Compensation for nucleotide bias in a genome by representation as a discrete channel with noise

MOTIVATION Calculation of the information content of motifs in genomes highly biased in nucleotide composition is likely to lead to overestimates of the amount of useful information in the motif. Calculating relative information can compensate for biases, however the resulting information content is the amount seen by an observer and not by a macromolecule binding to the motif. The latter is needed to calculate the discriminatory power of the motif and to compare motifs between species. RESULTS By treating a biased genome as a discrete channel with noise, in accordance with Shannon Information Theory, we were able to remove both 'Distortion' and 'Noise' from the motif and recover a more instructive biological 'signal.' A Java application, LogoPaint, was developed to remove nucleotide bias distortion and triplet frequency noise from motifs, calculate information content and present the motif as a logo. We demonstrate how this technique can 'unmask' motifs in the translation initiation regions of bacteria that are obscured by strong sequence biases. AVAILABILITY LogoPaint is available to all users from the authors as an executable JAR file. Source code is available by arrangement.

[1]  Chris M. Brown,et al.  Transterm: a database of messenger RNA components and signals , 2000, Nucleic Acids Res..

[2]  Francis Crick,et al.  Codon--anticodon pairing: the wobble hypothesis. , 1966, Journal of Molecular Biology.

[3]  S Karlin,et al.  Codon usages in different gene classes of the Escherichia coli genome , 1998, Molecular microbiology.

[4]  T. D. Schneider,et al.  Theory of molecular machines. I. Channel capacity of molecular machines. , 1991, Journal of theoretical biology.

[5]  W. Tate,et al.  Codon bias at the 3'-side of the initiation codon is correlated with translation initiation efficiency in Escherichia coli. , 2001, Gene.

[6]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[7]  G. Stormo Information content and free energy in DNA--protein interactions. , 1998, Journal of theoretical biology.

[8]  M. Springer,et al.  The role of the AUU initiation codon in the negative feedback regulation of the gene for translation initiation factor IF3 in Escherichia coli , 1996, Molecular microbiology.

[9]  Owen White,et al.  The Comprehensive Microbial Resource , 2001, Nucleic Acids Res..

[10]  S. Salzberg,et al.  Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi , 1997, Nature.

[11]  R. Simons,et al.  Escherichia coli translation initiation factor 3 discriminates the initiation codon in vivo , 1996, Molecular microbiology.

[12]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[13]  T D Schneider,et al.  Measuring molecular information. , 1999, Journal of theoretical biology.

[14]  T. D. Schneider,et al.  Theory of molecular machines. II. Energy dissipation from molecular machines. , 1991, Journal of theoretical biology.

[15]  N. Glansdorff,et al.  Synthesis of Escherichia coli carbamoylphosphate synthetase initiates at a UUG codon. , 1985, European journal of biochemistry.

[16]  T D Schneider,et al.  Excess information at bacteriophage T7 genomic promoters detected by a random cloning technique. , 1989, Nucleic acids research.

[17]  L. Isaksson,et al.  Cooperative effects by the initiation codon and its flanking regions on translation initiation. , 2001, Gene.