Enrichment of oligonucleotide sets with transcription control signals. III: DNA from non-mammalian vertebrates

We studied the frequency distribution of 1,048,576 oligonucleotides 10 bp long in a sample of 1.072 x 10(6) bases of genes from non-mammalian vertebrates, made of 322 sequences extracted from EMBL(R) 29.0, with the aim of detecting transcription control signals. Among all decamers, 2097 (0.2%) had a frequency 10 times higher than the mean and were subjected to further statistical analysis. For each of the 2097 decamers (parents), we counted the individual frequencies of the 30 decamers differing from the parent by one base mutation (progeny) and we calculated two variance/mean chi squares for the progeny, with and without the parent decamer. By studying the distribution of the ratio between the two chi squares we observed that out of 2097 decamers that occurred > 10 times more frequently than average, 1017 had a chi square ratio of between 1 and 1.5; in this final set, which corresponds to < 0.097% of all possible decamers, 75 decamers were found to contain 100 transcription control elements, like CCAAT and others. The final set contains a high excess of signals when compared to 100 random sets of 1017 decamers. Some of the decamers selected with the procedure are members of consensus sequences rather than unique sequences.

[1]  Overlapping redundant septuplets identical with regulatory elements of HIV-1 and SV40. , 1989, Nucleic acids research.

[2]  J. Strominger,et al.  Regulation of a transfected human class II major histocompatibility complex gene in human fibroblasts. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[3]  I. Barrai A model for differentiation. , 1983, Journal of theoretical biology.

[4]  Chiara Scapoli,et al.  A set of viral DNA decamers enriched in transcription control signals , 1991, Nucleic Acids Res..

[5]  E. Wingender,et al.  Compilation of transcription regulating proteins. , 1988, Nucleic acids research.

[6]  Silke Meyer,et al.  Compilation of vertebrate-encoded transcription factors , 1992, Nucleic Acids Res..

[7]  K. Sullivan,et al.  A model for the transcriptional regulation of MHC class II genes. , 1987, Immunology today.

[8]  M. Gouy,et al.  Codon catalog usage and the genome hypothesis. , 1980, Nucleic acids research.

[9]  Stefano Volinia,et al.  The frequency of oligonucleotides in mammalian genic regions , 1989, Comput. Appl. Biosci..

[10]  I Sauvaget,et al.  K-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping. , 1990, Methods in enzymology.

[11]  Manolo Gouy,et al.  Codon catalog usage is a genome strategy modulated for gene expressivity , 1981, Nucleic Acids Res..

[12]  Jean-Michel Claverie,et al.  Heuristic informational analysis of sequences , 1986, Nucleic Acids Res..

[13]  M S Waterman,et al.  Regulatory pattern identification in nucleic acid sequences. , 1983, Nucleic acids research.

[14]  M. Waterman,et al.  Statistical characterization of nucleic acid sequence functional domains. , 1983, Nucleic acids research.