Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA.

Prediction of splice site selection and efficiency from sequence inspection is of fundamental interest (testing the current knowledge of requisite sequence features) and practical importance (genome annotation, design of mutant or transgenic organisms). In plants, the dominant variables affecting splice site selection and efficiency include the degree of matching to the extended splice site consensus and the local gradient of U- and G+C-composition (introns being U-rich and exons G+C-rich). We present a novel method for splice site prediction, which was particularly trained for maize and Arabidopsis thaliana. The method extends our previous algorithm based on logitlinear models by considering three variables simultaneously: intrinsic splice site strength, local optimality and fit with respect to the overall splice pattern prediction. We show that the method considerably improves prediction specificity without compromising the high degree of sensitivity required in gene prediction algorithms. Applications to gene identification are illustrated for Arabidopsis and suggest that successful methods must combine scoring for splice sites, coding potential and similarity with potential homologs in non-trivial ways. A WWW version of the SplicePredictor program is available at http:/gnomic.stanford.edu/volker/SplicePredi ctor.html/

[1]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[2]  R. F. Smith,et al.  BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results. , 1995, Genome research.

[3]  Marvin B. Shapiro,et al.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. , 1987, Nucleic acids research.

[4]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[5]  M. Schuler,et al.  3' splice site selection in dicot plant nuclei is position dependent , 1993, Molecular and cellular biology.

[6]  J. Draper,et al.  Alternative processing of the maize Ac transcript in Arabidopsis. , 1997, The Plant journal : for cell and molecular biology.

[7]  Klaus Hermann,et al.  GeneGenerator - a flexible algorithm for gene prediction and its application to maize sequences , 1998, Bioinform..

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[10]  D. Söll,et al.  Molecular analysis of three maize 22 kDa auxin-binding protein genes--transient promoter expression and regulatory regions. , 1993, The Plant journal : for cell and molecular biology.

[11]  M. Hentze,et al.  Binary specification of nonsense codons by splicing and cytoplasmic translation , 1998, The EMBO journal.

[12]  P. Pevzner,et al.  Gene recognition via spliced sequence alignment. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Peter G. Korning,et al.  Splice Site Prediction in Arabidopsis Thaliana Pre-mRNA by Combining Local and Global Sequence Information , 1996 .

[14]  K. Irie,et al.  Possible involvement of differential splicing in regulation of the activity of Arabidopsis ANP1 that is related to mitogen-activated protein kinase kinase kinases (MAPKKKs). , 1997, The Plant journal : for cell and molecular biology.

[15]  V. Walbot,et al.  Bronze-2 Gene Expression and Intron Splicing Patterns in Cells and Tissues of Zea mays L. , 1992, Plant physiology.

[16]  L. Maquat,et al.  A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. , 1998, Trends in biochemical sciences.

[17]  V. Walbot,et al.  Nuclear pre-mRNA processing in higher plants. , 1994, Progress in nucleic acid research and molecular biology.

[18]  J. Brown,et al.  SPLICE SITE SELECTION IN PLANT PRE-mRNA SPLICING. , 1998, Annual review of plant physiology and plant molecular biology.

[19]  J W Fickett,et al.  Finding genes by computer: the state of the art. , 1996, Trends in genetics : TIG.

[20]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[21]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[22]  V. Walbot,et al.  Expression and RNA Splicing of the Maize Glutathione S-Transferase Bronze2 Gene Is Regulated by Cadmium and Other Stresses , 1997, Plant physiology.

[23]  Daniel R. Gallie,et al.  A look beyond transcription : mechanisms determining mRNA stability and translation in plants , 1998 .

[24]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[25]  M J Varagona,et al.  Alternative splicing induced by insertion of retrotransposons into the maize waxy gene. , 1992, The Plant cell.

[26]  R. Sinibaldi,et al.  Intron splicing and intron-mediated enhanced expression in monocots. , 1992, Progress in nucleic acid research and molecular biology.

[27]  S. Brunak,et al.  Cleaning the GenBank Arabidopsis thaliana data set. , 1996, Nucleic acids research.

[28]  V. Brendel,et al.  Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences. , 1996, Nucleic acids research.

[29]  M. Schuler,et al.  Factors affecting authentic 5' splice site selection in plant nuclei , 1993, Molecular and cellular biology.

[30]  V. Brendel,et al.  Prediction of splice sites in plant pre-mRNA from sequence properties. , 1998, Journal of molecular biology.

[31]  C. Dean,et al.  Inefficient and incorrect processing of the Ac transposase transcript in iae1 and wild-type Arabidopsis thaliana. , 1997, The Plant journal : for cell and molecular biology.

[32]  G. Goodall,et al.  The AU-rich sequences present in the introns of plant nuclear pre-mRNAs are required for splicing , 1989, Cell.

[33]  V. Walbot,et al.  Introns increase gene expression in cultured maize cells. , 1987, Genes & development.