Model-based detection of alternative splicing signals

Motivation: Transcripts from ∼95% of human multi-exon genes are subject to alternative splicing (AS). The growing interest in AS is propelled by its prominent contribution to transcriptome and proteome complexity and the role of aberrant AS in numerous diseases. Recent technological advances enable thousands of exons to be simultaneously profiled across diverse cell types and cellular conditions, but require accurate identification of condition-specific splicing changes. It is necessary to accurately identify such splicing changes to elucidate the underlying regulatory programs or link the splicing changes to specific diseases. Results: We present a probabilistic model tailored for high-throughput AS data, where observed isoform levels are explained as combinations of condition-specific AS signals. According to our formulation, given an AS dataset our tasks are to detect common signals in the data and identify the exons relevant to each signal. Our model can incorporate prior knowledge about underlying AS signals, measurement quality and gene expression level effects. Using a large-scale multi-tissue AS dataset, we demonstrate the advantage of our method over standard alternative approaches. In addition, we describe newly found tissue-specific AS signals which were verified experimentally, and discuss associated regulatory features. Contact: yoseph@psi.utoronto.ca; frey@psi.utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[2]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[3]  M. Ashiya,et al.  A neuron-specific splicing switch mediated by an array of pre-mRNA repressor sites: evidence of a regulatory role for the polypyrimidine tract binding protein and a brain-specific PTB counterpart. , 1997, RNA.

[4]  I. Pérez,et al.  Mutation of PTB binding sites causes misregulation of alternative 3' splice site selection in vivo. , 1997, RNA.

[5]  R. C. Chan,et al.  The polypyrimidine tract binding protein binds upstream of neural cell-specific c-src exon N1 to repress the splicing of the intron downstream , 1997, Molecular and cellular biology.

[6]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[7]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[8]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[9]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Nir Friedman,et al.  From promoter sequence to expression: a probabilistic framework , 2002, RECOMB '02.

[11]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[12]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[13]  S. Rafii,et al.  Splitting vessels: Keeping lymph apart from blood , 2003, Nature Medicine.

[14]  F. Baas,et al.  Alternative splicing in the N-terminus of Alzheimer’s presenilin 1 , 2004, Neurogenetics.

[15]  T. Cooper,et al.  Muscleblind proteins regulate alternative splicing , 2004, The EMBO journal.

[16]  B. Frey,et al.  Probabilistic sparse matrix factorization with an application to discovering gene functions in mouse mRNA expression data , 2004 .

[17]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[18]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[19]  B. Frey,et al.  Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. , 2004, Molecular cell.

[20]  J. Conboy,et al.  The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons , 2005, Nucleic acids research.

[21]  S. Richard,et al.  Target RNA motif and target mRNAs of the Quaking STAR protein , 2005, Nature Structural &Molecular Biology.

[22]  Brendan J. Frey,et al.  Multi-way clustering of microarray data using probabilistic sparse matrix factorization , 2005, ISMB.

[23]  D. Black,et al.  Structure of PTB Bound to RNA: Specific Binding and Implications for Splicing Regulation , 2005, Science.

[24]  B. Frey,et al.  Functional coordination of alternative splicing in the mammalian central nervous system , 2007, Genome Biology.

[25]  B. Blencowe,et al.  An RNA map predicting Nova-dependent splicing regulation , 2006, Nature.

[26]  Brendan J. Frey,et al.  Inferring global levels of alternative splicing isoforms using a generative model of microarray data , 2006, Bioinform..

[27]  D. Black,et al.  MicroRNAs regulate the expression of the alternative splicing factor nPTB during muscle development. , 2007, Genes & development.

[28]  Tyson A. Clark,et al.  A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing , 2007, Nucleic acids research.

[29]  A. Kornblihtt Coupling transcription and alternative splicing. , 2007, Advances in experimental medicine and biology.

[30]  Guey-Shin Wang,et al.  Splicing in disease: disruption of the splicing code and the decoding machinery , 2007, Nature Reviews Genetics.

[31]  C. Burge,et al.  integrated splicing code Splicing regulation : From a parts list of regulatory elements to an , 2022 .

[32]  Tyson A. Clark,et al.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing , 2008, Nature.

[33]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[34]  Jyoti K. Shah,et al.  Differential expression of 24 , 426 human alternative splicing events and predicted cis-regulation in 48 tissues and cell lines , 2011 .

[35]  Michael Q. Zhang,et al.  Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. , 2008, Genes & development.

[36]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[37]  B. Hartmann,et al.  Decrypting the genome's alternative messages. , 2009, Current opinion in cell biology.

[38]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[39]  Brendan J. Frey,et al.  Deciphering the splicing code , 2010, Nature.

[40]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[41]  D. Koller,et al.  GeneXPress : A Visualization and Statistical Analysis Tool for Gene Expression and Sequence Data , 2022 .