Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites.

An information analysis of the 5' (donor) and 3' (acceptor) sequences spanning the ends of nearly 1800 human introns has provided evidence for structural features of splice sites that bear upon spliceosome evolution and function: (1) 82% of the sequence information (i.e. sequence conservation) at donor junctions and 97% of the sequence information at acceptor junctions is confined to the introns, allowing codon choices throughout exons to be largely unrestricted. The distribution of information at intron-exon junctions is also described in detail and compared with footprints. (2) Acceptor sites are found to possess enough information to be located in the transcribed portion of the human genome, whereas donor sites possess about one bit less than the information needed to locate them independently. This difference suggests that acceptor sites are located first in humans and, having been located, reduce by a factor of two the number of alternative sites available as donors. Direct experimental evidence exists to support this conclusion. (3) The sequences of donor and acceptor splice sites exhibit a striking similarity. This suggests that the two junctions derive from a common ancestor and that during evolution the information of both sites shifted onto the intron. If so, the protein and RNA components that are found in contemporary spliceosomes, and which are responsible for recognizing donor and acceptor sequences, should also be related. This conclusion is supported by the common structures found in different parts of the spliceosome.

[1]  M. Rosbash,et al.  Evidence for the biochemical role of an internal sequence in yeast nuclear mRNA introns: Implications for U1 RNA and metazoan mRNA splicing , 1983, Cell.

[2]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[3]  A. Krämer Analysis of RNase-A-resistant regions of adenovirus 2 major late precursor-mRNA in splicing extracts reveals an ordered interaction of nuclear components with the substrate RNA. , 1987, Journal of molecular biology.

[4]  S Brunak,et al.  Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. , 1992, Nucleic acids research.

[5]  W. McClure,et al.  Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites for Escherichia coli integration host factor (IHF). , 1990, Nucleic acids research.

[6]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[7]  M. O'Neill,et al.  Training back-propagation neural networks to define and detect DNA-binding sites. , 1991, Nucleic acids research.

[8]  J. D. Watson The human genome project: past, present, and future. , 1990, Science.

[9]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. II. The binding specificity of cyclic AMP receptor protein to recognition sites. , 1988, Journal of molecular biology.

[10]  P. Sharp,et al.  A mutational analysis of spliceosome assembly: evidence for splice site collaboration during spliceosome formation. , 1987, Genes & development.

[11]  J. Quinqueton,et al.  Application of learning techniques to splicing site recognition. , 1985, Biochimie.

[12]  J. Ninio Kinetic amplification of enzyme discrimination. , 1975, Biochimie.

[13]  T D Schneider,et al.  Excess information at bacteriophage T7 genomic promoters detected by a random cloning technique. , 1989, Nucleic acids research.

[14]  E. Brody,et al.  The "spliceosome": yeast pre-messenger RNA associates with a 40S complex in a splicing-dependent reaction. , 1985, Science.

[15]  M. Rosbash,et al.  U1 snRNP can influence 3'-splice site selection as well as 5'-splice site selection. , 1991, Genes & development.

[16]  Stephen M. Mount,et al.  The U1 small nuclear RNA-protein complex selectively binds a 5′ splice site in vitro , 1983, Cell.

[17]  C R Cantor,et al.  Orchestrating the Human Genome Project. , 1990, Science.

[18]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[19]  Tom Maniatis,et al.  A role for exon sequences and splice-site proximity in splice-site selection , 1986, Cell.

[20]  T. D. Schneider,et al.  Theory of molecular machines. I. Channel capacity of molecular machines. , 1991, Journal of theoretical biology.

[21]  C. Norman,et al.  U5 snRNA interacts with exon sequences at 5′ and 3′ splice sites , 1992, Cell.

[22]  O. Berg,et al.  Selection of DNA binding sites by regulatory proteins. Functional specificity and pseudosite competition. , 1988, Journal of biomolecular structure & dynamics.

[23]  P Chambon,et al.  Organization and expression of eucaryotic split genes coding for proteins. , 1981, Annual review of biochemistry.

[24]  M. Kanehisa,et al.  Prediction of splice junctions in mRNA sequences. , 1985, Nucleic acids research.

[25]  Tom Maniatis,et al.  The role of small nuclear ribonucleoprotein particles in pre-mRNA splicing , 1987, Nature.

[26]  J. Ebel,et al.  U2 RNA shares a structural domain with U1, U4, and U5 RNAs. , 1982, The EMBO journal.

[27]  C Blomberg,et al.  Thermodynamic constraints on kinetic proofreading in biosynthetic pathways. , 1980, Biophysical journal.

[28]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[29]  T. Cavalier-smith,et al.  Selfish DNA and the origin of introns , 1985, Nature.

[30]  H. Busch,et al.  Small Nuclear RNAs: RNA Sequences, Structure, and Modifications , 1988 .

[31]  C. Ray Smith,et al.  Maximum-entropy and Bayesian methods in science and engineering , 1988 .

[32]  C. Fields,et al.  Information content of Caenorhabditis elegans splice site sequences varies with intron length. , 1990, Nucleic acids research.

[33]  T. D. Schneider,et al.  A design for computer nucleic-acid-sequence storage, retrieval, and manipulation. , 1982, Nucleic acids research.

[34]  Free-energy dissipation constraints on the accuracy of enzymatic selections. , 1980, Quarterly reviews of biophysics.

[35]  X. Wang,et al.  Hydroxyl radical "footprinting" of RNA: application to pre-mRNA splicing complexes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[36]  P. Sharp,et al.  Splicing of messenger RNA precursors. , 1985, Harvey lectures.

[37]  D. Hickey,et al.  A general model for the evolution of nuclear pre-mRNA introns. , 1989, Journal of theoretical biology.

[38]  C Blomberg,et al.  Energy considerations for kinetic proofreading in biosynthesis. , 1981, Journal of theoretical biology.

[39]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[40]  Thomas D. Schneider Information and Entropy of Patterns in Genetic Switchs , 1988 .

[41]  T. D. Schneider,et al.  Theory of molecular machines. II. Energy dissipation from molecular machines. , 1991, Journal of theoretical biology.

[42]  S. Cole,et al.  Molecular genetic analysis of FNR‐dependent promoters , 1989, Molecular microbiology.

[43]  D. Hickey,et al.  Introns as relict retrotransposons: implications for the evolutionary origin of eukaryotic mRNA splicing mechanisms. , 1986, Journal of theoretical biology.

[44]  S. Knudsen,et al.  Cleaning up gene databases , 1990, Nature.

[45]  S. Berget,et al.  Exon definition may facilitate splice site selection in RNAs with multiple exons. , 1990, Molecular and cellular biology.

[46]  W. Noon,et al.  Intron splicing: a conserved internal signal in introns of animal pre-mRNAs. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Phillip A. Sharp,et al.  A multicomponent complex is involved in the splicing of messenger RNA precursors , 1985, Cell.

[48]  M. Mars,et al.  Characterization of vaccinia virus early promoters and evaluation of their informational content. , 1987, Journal of molecular biology.

[49]  David Frendewey,et al.  Stepwise assembly of a pre-mRNA splicing complex requires U-snRNPs and specific intron sequences , 1985, Cell.

[50]  P. Grabowski,et al.  Combinatorial splicing of exon pairs by two-site binding of U1 small nuclear ribonucleoprotein particle , 1991, Molecular and Cellular Biology.

[51]  M. Green,et al.  Biochemical mechanisms of constitutive and regulated pre-mRNA splicing. , 1991, Annual review of cell biology.

[52]  S. Berget,et al.  Effect of 5' splice site mutations on splicing of the preceding intron , 1990, Molecular and cellular biology.

[53]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[54]  J. Hopfield,et al.  Computing with neural circuits: a model. , 1986, Science.

[55]  Zieve Gw,et al.  Cell biology of the snRNP particles. , 1990 .

[56]  J. Hopfield Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[57]  B. Schweitzer,et al.  The nucleotide sequence of the yeast ARG4 gene. , 1984, Gene.

[58]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[59]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[60]  T. Maniatis,et al.  Purification and visualization of native spliceosomes , 1988, Cell.

[61]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[62]  J. Steitz,et al.  Functions of the Abundant U-snRNPs , 1988 .

[63]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[64]  T. D. Schneider,et al.  Characterization of Translational Initiation Sites in E. Coui , 1982 .

[65]  R. Spritz,et al.  RNA splice site selection: evidence for a 5' leads to 3' scanning model. , 1983, Science.

[66]  L. Gold,et al.  [27] Extension inhibition analysis of translation initiation complexes☆ , 1988 .

[67]  M S Gelfand,et al.  Computer prediction of the exon-intron structure of mammalian pre-mRNAs. , 1990, Nucleic acids research.

[68]  B. Kastner,et al.  Electron microscopy of small nuclear ribonucleoprotein (snRNP) particles U2 and U5: evidence for a common structure-determining principle in the major U snRNP family. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[69]  M. Aebi,et al.  5′ cleavage site in eukaryotic pre-mRNA splicing is determined by the overall 5′ splice region, not by the conserved 5′ GU , 1987, Cell.

[70]  M A Savageau,et al.  Optimization of kinetic proofreading: a general method for derivation of the constraint relations and an exploration of a specific case. , 1981, Journal of theoretical biology.

[71]  L. Brillouin,et al.  Physical Entropy and Information. II , 1951 .

[72]  T D Schneider,et al.  High information conservation implies that at least three proteins bind independently to F plasmid incD repeats , 1992, Journal of bacteriology.

[73]  J. Reiser,et al.  Evidence against a scanning model of RNA splicing. , 1983, The EMBO journal.

[74]  Christian Gautier,et al.  Statistical method for predicting protein coding regions in nucleic acid sequences , 1987, Comput. Appl. Biosci..

[75]  P. Sharp,et al.  Identification and purification of a 62,000-dalton protein that binds specifically to the polypyrimidine tract of introns. , 1989, Genes & development.

[76]  M. Green Pre-mRNA splicing. , 1986, Annual review of genetics.

[77]  D. Martindale,et al.  Nuclear pre-mRNA introns: analysis and comparison of intron sequences from Tetrahymena thermophila and other eukaryotes. , 1990, Nucleic acids research.

[78]  M. Aebi,et al.  Precision and orderliness in splicing , 1987 .

[79]  J Rothstein,et al.  Information, Measurement, and Quantum Mechanics. , 1951, Science.

[80]  Y. Ohshima,et al.  Signals for the selection of a splice site in pre-mRNA. Computer analysis of splice junction sequences and like sequences. , 1987, Journal of molecular biology.

[81]  M. Tribus,et al.  Energy and information , 1971 .

[82]  T. Steitz,et al.  A model for the non-specific binding of catabolite gene activator protein to DNA. , 1984, Nucleic acids research.

[83]  F E Penotti,et al.  Human DNA TATA boxes and transcription initiation sites. A statistical study. , 1990, Journal of molecular biology.

[84]  B. Séraphin,et al.  Who's on first? The U1 snRNP-5' splice site interaction and splicing. , 1991, Trends in biochemical sciences.

[85]  P. Sillekens,et al.  cDNA cloning of the human U1 snRNA‐associated A protein: extensive homology between U1 and U2 snRNP‐specific proteins. , 1987, The EMBO journal.

[86]  Gary D. Stormo,et al.  Delila system tools , 1984, Nucleic Acids Res..

[87]  John Rogers,et al.  Split-gene evolution: Exon shuffling and intron insertion in serine protease genes , 1985, Nature.

[88]  F E Penotti,et al.  Human pre-mRNA splicing signals. , 1991, Journal of theoretical biology.

[89]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .