The analysis of intron data and their use in the detection of short signals

SummaryIn order to examine whether certain short DNA sequences (putative splice signals) occurred in a certain region of an intron more often than would be expected by chance, intron data were examined to see what structure they took. There were significant departures from equal nucleotide frequency, and successive nucleotides clearly did not occur independently in the rat and mouse introns examined. The nonindependence was mainly due to a CG shortage and a less marked TA shortage. However the pairwise frequencies explained almost all the variability in triplet frequencies in the data and so the data could be approximately modeled by using nucleotide frequencies conditional on what the previous nucleotide was. Some coding DNA was also examined and the pairs in second and third positions, and third and first positions in a codon, showed similar departures from independence to those of the intron data. Using the probability model derived for intron data, expected frequencies of putative signals were derived and compared with the observed frequencies.

[1]  D. Givol,et al.  Nucleotide sequence of the rat skeletal muscle actin gene , 1982, Nature.

[2]  J. Martial,et al.  Molecular cloning of DNA complementary to bovine growth hormone mRNA. , 1980, The Journal of biological chemistry.

[3]  J. Josse,et al.  Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. , 1961, The Journal of biological chemistry.

[4]  N. Rosenthal,et al.  The structure and evolution of the two nonallelic rat preproinsulin genes , 1979, Cell.

[5]  J. Nelder,et al.  The GLIM System Release 3. , 1979 .

[6]  D. Gallwitz,et al.  Evidence for an intron-contained sequence required for the splicing of yeast RNA polymerase II transcripts , 1983, Cell.

[7]  M. Rosbash,et al.  Evidence for the biochemical role of an internal sequence in yeast nuclear mRNA introns: Implications for U1 RNA and metazoan mRNA splicing , 1983, Cell.

[8]  R Nussinov,et al.  Doublet frequencies in evolutionary distinct groups. , 1984, Nucleic acids research.

[9]  T Gojobori,et al.  Codon usage tabulated from the GenBank Genetic Sequence Data. , 1988, Nucleic acids research.

[10]  W. H. Mager,et al.  The genes coding for histone H3 and H4 in Neurospora crassa are unique and contain intervening sequences. , 1983, Nucleic acids research.

[11]  W. Noon,et al.  Intron splicing: a conserved internal signal in introns of animal pre-mRNAs. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J. Fincham,et al.  The complete nucleotide sequence of the Neurospora crassa am (NADP-specific glutamate dehydrogenase) gene. , 1983, Gene.

[13]  D. Tautz,et al.  Cryptic simplicity in DNA is a major source of genetic variation , 1986, Nature.

[14]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[15]  T Gojobori,et al.  Codon usage tabulated from the GenBank genetic sequence data. , 1991, Nucleic acids research.

[16]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[17]  P Chambon,et al.  Organization and expression of eucaryotic split genes coding for proteins. , 1981, Annual review of biochemistry.

[18]  R. Lathe Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations. , 1985, Journal of molecular biology.

[19]  S. Nakanishi,et al.  Cloning and sequence analysis of cDNA for bovine adrenal preproenkephalin , 1982, Nature.

[20]  J. Piatigorsky,et al.  Alternative RNA splicing of the murine αA-crystallin gene: Protein-coding information within an intron , 1983, Cell.

[21]  M. Bulmer,et al.  A statistical analysis of nucleotide sequences of introns and exons in human genes. , 1987, Molecular biology and evolution.