A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

Experimental detection of RNA splicing branchpoints is difficult. To date, high-confidence experimental annotations exist for 18% of 3' splice sites in the human genome. We develop a deep-learning-based branchpoint predictor, LaBranchoR, which predicts a correct branchpoint for at least 75% of 3' splice sites genome-wide. Detailed analysis of cases in which our predicted branchpoint deviates from experimental data suggests a correct branchpoint is predicted in over 90% of cases. We use our predicted branchpoints to identify a novel sequence element upstream of branchpoints consistent with extended U2 snRNA base-pairing, show an association between weak branchpoints and alternative splicing, and explore the effects of genetic variants on branchpoints. We provide genome-wide branchpoint annotations and in silico mutagenesis scores at http://bejerano.stanford.edu/labranchor.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  C. Oubridge,et al.  CryoEM structure of the spliceosome immediately after branching , 2016, Nature.

[3]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[4]  A. Yu,et al.  Pseudouridines in spliceosomal snRNAs , 2011, Protein & Cell.

[5]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[6]  Ianessa Morantte,et al.  Corrigendum: Splicing factor 1 modulates dietary restriction and TORC1 pathway longevity in C. elegans , 2017, Nature.

[7]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[8]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[9]  Douglas H. Turner,et al.  The contribution of pseudouridine to stabilities and structure of RNAs , 2013, Nucleic acids research.

[10]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[11]  Aaron A Hoskins,et al.  The spliceosome: a flexible, reversible macromolecular machine. , 2012, Trends in biochemical sciences.

[12]  Tim R. Mercer,et al.  Machine-learning annotation of human splicing branchpoints , 2016 .

[13]  Yi Xing,et al.  αCP binding to a cytosine-rich subset of polypyrimidine tracts drives a novel pathway of cassette exon splicing in the mammalian transcriptome , 2016, Nucleic acids research.

[14]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[15]  Eric T. Wang,et al.  Identification of new branch points and unconventional introns in Saccharomyces cerevisiae , 2016, RNA.

[16]  Allison J. Taggart,et al.  Large-scale analysis of branchpoint usage across species and cell lines. , 2017, Genome research.

[17]  B. Séraphin,et al.  The branchpoint residue is recognized during commitment complex formation before being bulged out of the U2 snRNA-pre-mRNA duplex , 1997, Molecular and cellular biology.

[18]  Joseph M. Paggi,et al.  S-CAP extends clinical-grade pathogenicity prediction to genetic variants that affect RNA splicing , 2018, bioRxiv.

[19]  Gerta Hoxhaj,et al.  Splicing Factor 1 Modulates Dietary Restriction and TORC1 Pathway Longevity in C. elegans , 2016, Nature.

[20]  M. Rosbash,et al.  A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition. , 1998, Genes & development.

[21]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[22]  H. Stark,et al.  Cryo-EM structure of a human spliceosome activated for step 2 of splicing , 2017, Nature.

[23]  S. Roman-Roman,et al.  Cancer-associated SF3B1 mutations affect alternative splicing by promoting alternative branchpoint usage , 2016, Nature Communications.

[24]  R. Reed,et al.  Evidence that sequence-independent binding of highly conserved U2 snRNP proteins upstream of the branch site is required for assembly of spliceosomal complex A. , 1996, Genes & development.

[25]  C. Schlötterer,et al.  The Genomic Signature of Splicing-Coupled Selection Differs between Long and Short Introns , 2011, Molecular biology and evolution.

[26]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[27]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[28]  Christopher B. Burge,et al.  Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals , 2004, J. Comput. Biol..

[29]  Alan D. Frankel,et al.  Recognition of RNA Branch Point Sequences by the KH Domain of Splicing Factor 1 (Mammalian Branch Point Binding Protein) in a Splicing Factor Complex , 2001, Molecular and Cellular Biology.

[30]  Benjamin J. Raphael,et al.  Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes , 2011, Proceedings of the National Academy of Sciences.

[31]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[32]  Tim R. Mercer,et al.  Machine learning annotation of human branchpoints , 2018, Bioinform..

[33]  Christopher W. J. Smith,et al.  Genome-Wide Association between Branch Point Properties and Alternative Splicing , 2010, PLoS Comput. Biol..

[34]  Wilfried Haerty,et al.  Genome-wide discovery of human splicing branchpoints , 2015, Genome research.

[35]  Allison J. Taggart,et al.  Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo , 2012, Nature Structural &Molecular Biology.

[36]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[37]  J. Manley,et al.  Base pairing between U2 and U6 snRNAs is necessary for splicing of a mammalian pre-mRNA , 1991, Nature.

[38]  Anke Busch,et al.  Efficient internal exon recognition depends on near equal contributions from the 3′ and 5′ splice sites , 2011, Nucleic acids research.