Predicting sequence and structural specificities of RNA binding regions recognized by splicing factor SRSF1

BackgroundRNA-binding proteins (RBPs) play diverse roles in eukaryotic RNA processing. Despite their pervasive functions in coding and noncoding RNA biogenesis and regulation, elucidating the sequence specificities that define protein-RNA interactions remains a major challenge. Recently, CLIP-seq (Cross-linking immunoprecipitation followed by high-throughput sequencing) has been successfully implemented to study the transcriptome-wide binding patterns of SRSF1, PTBP1, NOVA and fox2 proteins. These studies either adopted traditional methods like Multiple EM for Motif Elicitation (MEME) to discover the sequence consensus of RBP's binding sites or used Z-score statistics to search for the overrepresented nucleotides of a certain size. We argue that most of these methods are not well-suited for RNA motif identification, as they are unable to incorporate the RNA structural context of protein-RNA interactions, which may affect to binding specificity. Here, we describe a novel model-based approach--RNAMotifModeler to identify the consensus of protein-RNA binding regions by integrating sequence features and RNA secondary structures.ResultsAs an example, we implemented RNAMotifModeler on SRSF1 (SF2/ASF) CLIP-seq data. The sequence-structural consensus we identified is a purine-rich octamer 'AGAAGAAG' in a highly single-stranded RNA context. The unpaired probabilities, the probabilities of not forming pairs, are significantly higher than negative controls and the flanking sequence surrounding the binding site, indicating that SRSF1 proteins tend to bind on single-stranded RNA. Further statistical evaluations revealed that the second and fifth bases of SRSF1octamer motif have much stronger sequence specificities, but weaker single-strandedness, while the third, fourth, sixth and seventh bases are far more likely to be single-stranded, but have more degenerate sequence specificities. Therefore, we hypothesize that nucleotide specificity and secondary structure play complementary roles during binding site recognition by SRSF1.ConclusionIn this study, we presented a computational model to predict the sequence consensus and optimal RNA secondary structure for protein-RNA binding regions. The successful implementation on SRSF1 CLIP-seq data demonstrates great potential to improve our understanding on the binding specificity of RNA binding proteins.

[1]  Xin Wang,et al.  Identification of Nuclear and Cytoplasmic mRNA Targets for the Shuttling Protein SF2/ASF , 2008, PloS one.

[2]  P. Dorrestein,et al.  A sliding docking interaction is essential for sequential and processive phosphorylation of an SR protein by SRPK1. , 2008, Molecular cell.

[3]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[4]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[5]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[6]  Andrea Barta,et al.  Strategies for RNA folding and assembly , 2004, Nature Reviews Molecular Cell Biology.

[7]  I. Hofacker RNA Secondary Structure Analysis Using the Vienna RNA Package , 2003, Current protocols in bioinformatics.

[8]  A. Zahler,et al.  A subset of SR proteins activates splicing of the cardiac troponin T alternative exon by direct interactions with an exonic enhancer , 1995, Molecular and cellular biology.

[9]  B. Blencowe,et al.  An RNA map predicting Nova-dependent splicing regulation , 2006, Nature.

[10]  Brian W. Matthews,et al.  No code for recognition , 1988, Nature.

[11]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[12]  Jernej Ule,et al.  CLIP: a method for identifying protein-RNA interaction sites in living cells. , 2005, Methods.

[13]  J. Manley,et al.  The human splicing factors ASF/SF2 and SC35 possess distinct, functionally significant RNA binding specificities. , 1995, The EMBO journal.

[14]  G. Casari,et al.  A novel bipartite splicing enhancer modulates the differential processing of the human fibronectin EDA exon. , 1994, Nucleic acids research.

[15]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[16]  Gene W. Yeo,et al.  An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells , 2009, Nature Structural &Molecular Biology.

[17]  Wenbo Xu,et al.  Particle swarm optimization with particles having quantum behavior , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[18]  J. Ellis,et al.  Multiple roles of arginine/serine-rich splicing factors in RNA processing. , 2005, Biochemical Society transactions.

[19]  Jernej Ule,et al.  CLIP Identifies Nova-Regulated RNA Networks in the Brain , 2003, Science.

[20]  Frédéric H.-T. Allain,et al.  Sequence-specific binding of single-stranded RNA: is there a code for recognition? , 2006, Nucleic acids research.

[21]  Matthew Mort,et al.  Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. , 2009, Genome research.

[22]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[23]  M. Hiller,et al.  Using RNA secondary structures to guide sequence motif finding towards single-stranded regions , 2006, Nucleic acids research.

[24]  Xiang-Dong Fu,et al.  Adaptable molecular interactions guide phosphorylation of the SR protein ASF/SF2 by SRPK1. , 2008, Journal of molecular biology.

[25]  E. Buratti,et al.  Exon Enhancer Elements in the Fibronectin EDA Proteins by Mouse and Human Polypurinic RNA Folding Affects the Recruitment of SR , 2004 .

[26]  Gene W. Yeo,et al.  Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. , 2009, Molecular cell.

[27]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[28]  A. Krainer,et al.  The gene encoding the splicing factor SF2/ASF is a proto-oncogene , 2007, Nature Structural &Molecular Biology.