Prediction of alternatively spliced exons using Support Vector Machines

Alternative splicing is a mechanism for generating different gene transcripts (called isoforms) from the same genomic sequence. In this paper, we explore the predictive power of a large set of diverse gene features that have been experimentally shown to have effect on alternative splicing. We use such features to build support vector machine classifiers for predicting alternatively spliced exons. Experimental results show that classifiers built from the diverse set of features give better results than those that consider only basic sequence features. Furthermore, we use feature selection methods to identify the most informative features for the prediction problem at hand.

[1]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[2]  A. Krainer,et al.  Listening to silence and understanding nonsense: exonic mutations that affect splicing , 2002, Nature Reviews Genetics.

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Uwe Ohler,et al.  Strategies for Identifying RNA Splicing Regulatory Motifs and Predicting Alternative Splicing Events , 2008, PLoS Comput. Biol..

[5]  Ron Shamir,et al.  A non-EST-based method for exon-skipping prediction. , 2004, Genome research.

[6]  B. Graveley Alternative splicing: increasing diversity in the proteomic world. , 2001, Trends in genetics : TIG.

[7]  Gunnar Rätsch,et al.  Accurate splice site prediction using support vector machines , 2007, BMC Bioinformatics.

[8]  Antonio Marín,et al.  Characterization and prediction of alternative splice sites. , 2006, Gene.

[9]  Steven Salzberg,et al.  A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana , 2007, BMC Bioinformatics.

[10]  Douglas L. Brutlag,et al.  Remote homology detection: a motif based approach , 2003, ISMB.

[11]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  T A Thanaraj,et al.  Prediction and statistical analysis of alternatively spliced exons. , 2003, Progress in molecular and subcellular biology.

[15]  Desmond G. Higgins,et al.  Gene Expression, Intron Density, and Splice Site Strength in Drosophila and Caenorhabditis , 2007, Journal of Molecular Evolution.

[16]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[17]  Rolf Backofen,et al.  Pre-mRNA Secondary Structures Influence Exon Recognition , 2007, PLoS genetics.

[18]  Gil Ast,et al.  How did alternative splicing evolve? , 2004, Nature Reviews Genetics.

[19]  Donald J. Patterson,et al.  Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction , 2001, Pacific Symposium on Biocomputing.

[20]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[21]  Doina Caragea,et al.  Exploring Alternative Splicing Features Using Support Vector Machines , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[22]  W. Gilbert Why genes in pieces? , 1978, Nature.

[23]  Hiram Clawson,et al.  Intronic Alternative Splicing Regulators Identified by Comparative Genomics in Nematodes , 2006, PLoS Comput. Biol..

[24]  Gene W. Yeo,et al.  Variation in sequence and organization of splicing regulatory elements in vertebrate genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[27]  R. Sorek,et al.  Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. , 2003, Genome research.

[28]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[29]  Bosiljka Tasic,et al.  Alternative pre-mRNA splicing and proteome expansion in metazoans , 2002, Nature.

[30]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[31]  Ugur Sahin,et al.  Human Cancer , 2006 .

[32]  Phillip A Sharp,et al.  Predictive Identification of Exonic Splicing Enhancers in Human Genes , 2002, Science.

[33]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[34]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[35]  Gunnar Rätsch,et al.  RASE: recognition of alternatively spliced exons in C.elegans , 2005, ISMB.

[36]  W. Gish,et al.  Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. , 2001, Genome research.

[37]  Ian Witten,et al.  Data Mining , 2000 .

[38]  Ron Shamir,et al.  Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..

[39]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[40]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .