Comparative analysis of core promoter region: Information content from mono and dinucleotide substitution matrices

We have studied the core promoter region in five sets of promoter sequences by calculating the average mutual information content H (relative entropy). We have used specially constructed substitution matrices to calculate mono and dinucleotide replacements in a given block of aligned sequences. These substitution matrices use log-odds form of scores, which are in bits of information. Here, we constructed and applied nucleotide substitution matrices for the core promoter region to calculate the information content to study the Transcription Start Site (TSS), TATA-box and downstream regions. As expected, the information content decreases with increasing block size. This clearly implies that the TSS region is likely to be 5-10 bases in size (length). We also notice that both in the case of mouse and humans, both TATA-boxes and TSS regions are likely to play important roles in proper transcriptional initiation.

[1]  S. Altschul,et al.  The compositional adjustment of amino acid substitution matrices , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Vladimir B. Bajic,et al.  Content analysis of the core promoter region of human genes , 2003, Silico Biol..

[3]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[4]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[5]  Vladimir Brusic,et al.  Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates. , 2003, Journal of molecular graphics & modelling.

[6]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[7]  Hanah Margalit,et al.  PromEC: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites , 2001, Nucleic Acids Res..

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter. , 2003, Annual review of biochemistry.

[10]  Wei-Mou Zheng Relation between weight matrix and substitution matrix: motif search by similarity , 2005, Bioinform..

[11]  A. Panchenko,et al.  A comparison of position‐specific score matrices based on sequence and structure alignments , 2002, Protein science : a publication of the Protein Society.

[12]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[13]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Philipp Bucher,et al.  The Eukaryotic Promoter Database (EPD) , 2000, Nucleic Acids Res..

[15]  Jurg Ott,et al.  Distribution and characterization of regulatory elements in the human genome. , 2002, Genome research.

[16]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[17]  H B Nicholas,et al.  Strategies for searching sequence databases. , 2000, BioTechniques.

[18]  Yves Moreau,et al.  Comprehensive analysis of the base composition around the transcription start site in Metazoa , 2004, BMC Genomics.

[19]  John M. Hancock,et al.  PlantProm: a database of plant promoter sequences , 2003, Nucleic Acids Res..

[20]  Stephen F. Altschul,et al.  The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions , 2005, Bioinform..

[21]  Peter F. Arndt,et al.  Identification and Measurement of Neigbor Dependent Nucleotide Substitution Processes , 2005, German Conference on Bioinformatics.

[22]  Jotun Hein,et al.  A nucleotide substitution model with nearest-neighbour interactions , 2004, ISMB/ECCB.

[23]  S. Altschul A protein alignment scoring system sensitive at all evolutionary distances , 1993, Journal of Molecular Evolution.