Comparative Analysis of Transcription Start Sites Using Mutual Information

The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5–10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.

[1]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[2]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[3]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD , 1998, Nucleic Acids Res..

[4]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[5]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter. , 2003, Annual review of biochemistry.

[6]  Yves Moreau,et al.  Comprehensive analysis of the base composition around the transcription start site in Metazoa , 2004, BMC Genomics.

[7]  J. Boore Animal mitochondrial genomes. , 1999, Nucleic acids research.

[8]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[9]  M. Rabinowitz,et al.  Identification of initiation sites for heavy-strand and light-strand transcription in human mitochondrial DNA. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jorge Stolfi,et al.  Mutual Information Content of Homologous DNA Sequences , 2004, WOB.

[11]  Liaofu Luo,et al.  Shannon information in complete genomes , 2005, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[12]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[13]  Hanah Margalit,et al.  PromEC: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites , 2001, Nucleic Acids Res..

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[16]  A. Dinasarapu,et al.  Functional classification of transcription factor binding sites: Information content as a metric , 2006, J. Integr. Bioinform..

[17]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[18]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[19]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[20]  G. Stormo,et al.  Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites , 2005, Nucleic acids research.

[21]  Peter F. Arndt,et al.  Identification and Measurement of Neigbor Dependent Nucleotide Substitution Processes , 2005, German Conference on Bioinformatics.

[22]  Vladimir B. Bajic,et al.  Content analysis of the core promoter region of human genes , 2003, Silico Biol..

[23]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Ashok Reddy Dinasarapu,et al.  Functional classification of transcription factor binding sites: information content as a metric , 2006, J. Integr. Bioinform..

[25]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Ashok Reddy Dinasarapu,et al.  Comparative analysis of core promoter region: Information content from mono and dinucleotide substitution matrices , 2006, Comput. Biol. Chem..

[27]  Jotun Hein,et al.  A nucleotide substitution model with nearest-neighbour interactions , 2004, ISMB/ECCB.

[28]  J. Taanman,et al.  The mitochondrial genome: structure, transcription, translation and replication. , 1999, Biochimica et biophysica acta.