Identify Alternative Splicing Events Based on Position-Specific Evolutionary Conservation

The evolution of eukaryotes is accompanied by the increased complexity of alternative splicing which greatly expands genome information. One of the greatest challenges in the post-genome era is a complete revelation of human transcriptome with consideration of alternative splicing. Here, we introduce a comparative genomics approach to systemically identify alternative splicing events based on the differential evolutionary conservation between exons and introns and the high-quality annotation of the ENCODE regions. Specifically, we focus on exons that are included in some transcripts but are completely spliced out for others and we call them conditional exons. First, we characterize distinguishing features among conditional exons, constitutive exons and introns. One of the most important features is the position-specific conservation score. There are dramatic differences in conservation scores between conditional exons and constitutive exons. More importantly, the differences are position-specific. For flanking intronic regions, the differences between conditional exons and constitutive exons are also position-specific. Using the Random Forests algorithm, we can classify conditional exons with high specificities (97% for the identification of conditional exons from intron regions and 95% for the classification of known exons) and fair sensitivities (64% and 32% respectively). We applied the method to the human genome and identified 39,640 introns that actually contain conditional exons and classified 8,813 conditional exons from the current RefSeq exon list. Among those, 31,673 introns containing conditional exons and 5,294 conditional exons classified from known exons cannot be inferred from RefSeq, UCSC or Ensembl annotations. Some of these de novo predictions were experimentally verified.

[1]  Ron Shamir,et al.  A non-EST-based method for exon-skipping prediction. , 2004, Genome research.

[2]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[3]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[4]  Araxi O. Urrutia,et al.  Splicing and the Evolution of Proteins in Mammals , 2007, Nature Reviews Genetics.

[5]  Robi David Mitra,et al.  Non-EST-based prediction of novel alternatively spliced cassette exons with cell signaling function in Caenorhabditis elegans and human , 2007, Nucleic acids research.

[6]  Ron Shamir,et al.  Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..

[7]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[8]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[9]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[10]  R. Sorek,et al.  Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. , 2003, Genome research.

[11]  Dirk Holste,et al.  Single Nucleotide Polymorphism–Based Validation of Exonic Splicing Enhancers , 2004, PLoS biology.

[12]  T. Cooper,et al.  Alternative Splicing Regulation Impacts Heart Development , 2005, Cell.

[13]  Michael Ruogu Zhang,et al.  An alternative-exon database and its statistical analysis. , 2000, DNA and cell biology.

[14]  Tomaso Poggio,et al.  Identification and analysis of alternative splicing events conserved in human and mouse. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[16]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[19]  Yi Xing,et al.  Evidence for a subpopulation of conserved alternative splicing events under selection pressure for protein reading frame preservation. , 2004, Nucleic acids research.

[20]  Douglas L. Black,et al.  Neuronal regulation of alternative pre-mRNA splicing , 2007, Nature Reviews Neuroscience.

[21]  Yi Xing,et al.  Evidence of functional selection pressure for alternative splicingevents that accelerate evolution of protein subsequences , 2005, Genome Biology.

[22]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[23]  Christopher J. Lee,et al.  Alternative splicing and RNA selection pressure — evolutionary consequences for eukaryotic genomes , 2006, Nature Reviews Genetics.

[24]  W. Gish,et al.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. , 2001, Genome research.