Identical sequence patterns in the ends of exons and introns of human protein-coding genes

Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated identity patterns between the last 15 nucleotides of the exonic sequence preceding the 5' splice site and the intronic sequence preceding the 3' splice site in a set of human protein-coding genes that do not exhibit intron retention. We found that almost 60% of consecutive exons and introns in human protein-coding genes share at least two identical nucleotides at their 3' ends and, on average, the sequence identity length is 2.47 nucleotides. Based on our findings we conclude that the 3' ends of exons and introns tend to have longer identical sequences within a gene than when being taken from different genes. Our results hold even if the pairs are non-consecutive in the transcription order.

[1]  V. Solovyev,et al.  Analysis of canonical and non-canonical splice sites in mammalian genomes. , 2000, Nucleic acids research.

[2]  D. Baralle,et al.  Splicing in action: assessing disease causing sequence changes , 2005, Journal of Medical Genetics.

[3]  P Chambon,et al.  Organization and expression of eucaryotic split genes coding for proteins. , 1981, Annual review of biochemistry.

[4]  Walter Gilbert,et al.  The evolution of spliceosomal introns: patterns, puzzles and progress , 2006, Nature Reviews Genetics.

[5]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[6]  Marvin B. Shapiro,et al.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. , 1987, Nucleic acids research.

[7]  Arlin Stoltzfus,et al.  The evolutionary gain of spliceosomal introns: sequence and phase preferences. , 2004, Molecular biology and evolution.

[8]  Christian J. Michel,et al.  Computation of direct and inverse mutations with the SEGM web server (Stochastic Evolution of Genetic Motifs): An application to splice sites of human genome introns , 2009, Comput. Biol. Chem..

[9]  S. Gaubatz,et al.  LIN54 is an essential core subunit of the DREAM/LINC complex that binds to the cdc2 promoter in a sequence‐specific manner , 2009, The FEBS journal.

[10]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[11]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[12]  Sherif Abou Elela,et al.  Modern origin of numerous alternatively spliced human introns from tandem arrays , 2007, Proceedings of the National Academy of Sciences.

[13]  Rolf Backofen,et al.  Phylogenetically widespread alternative splicing at unusual GYNGYN donors , 2006, Genome Biology.

[14]  Stephen M. Mount,et al.  Splicing signals in Drosophila: intron size, information content, and consensus sequences. , 1992, Nucleic acids research.

[15]  Jørgen Kjems,et al.  Defining a 5' splice site by functional selection in the presence and absence of U1 snRNA 5' end. , 2002, RNA.

[16]  P. Sharp,et al.  Splicing of messenger RNA precursors. , 1987, Annual Review of Biochemistry.

[17]  Yael Mandel-Gutfreund,et al.  Alternative splicing regulation at tandem 3′ splice sites , 2006, Nucleic acids research.

[18]  Rolf Backofen,et al.  Accurate prediction of NAGNAG alternative splicing , 2009, Nucleic acids research.

[19]  Michael Ruogu Zhang,et al.  Statistical features of human exons and their flanking regions. , 1998, Human molecular genetics.

[20]  P. Sharp,et al.  Splicing of messenger RNA precursors. , 1987, Science.

[21]  Rolf Backofen,et al.  Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity , 2004, Nature Genetics.