SSMART: Sequence-structure motif identification for RNA-binding proteins

Motivation RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. Results We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability and implementation SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Yoosik Kim,et al.  LIN28A Is a Suppressor of ER-Associated Translation in Embryonic Stem Cells , 2012, Cell.

[2]  Uwe Ohler,et al.  FMR1 targets distinct mRNA sequence elements to regulate protein expression , 2012, Nature.

[3]  Jernej Ule,et al.  CLIP Identifies Nova-Regulated RNA Networks in the Brain , 2003, Science.

[4]  Christine E. Heitsch,et al.  Profiling small RNA reveals multimodal substructural signals in a Boltzmann ensemble , 2014, Nucleic acids research.

[5]  Peter F. Stadler,et al.  Local RNA base pairing probabilities in large sequences , 2006, Bioinform..

[6]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[7]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[8]  Richard Bonneau,et al.  The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. , 2012, Molecular cell.

[9]  Sayan Mukherjee,et al.  Evidence-ranked motif identification , 2010, Genome Biology.

[10]  Markus Landthaler,et al.  Transcriptome‐wide Identification of RNA‐binding Protein Binding Sites Using Photoactivatable‐Ribonucleoside‐Enhanced Crosslinking Immunoprecipitation (PAR‐CLIP) , 2017, Current protocols in molecular biology.

[11]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[12]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[13]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[14]  R. Elkon,et al.  A Pumilio-induced RNA structure switch in p27-3′ UTR controls miR-221 and miR-222 accessibility , 2010, Nature Cell Biology.

[15]  S. Grellscheid,et al.  Structural basis of RNA recognition and dimerization by the STAR proteins T-STAR and Sam68 , 2016, Nature Communications.

[16]  Gene W. Yeo,et al.  LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance. , 2012, Molecular cell.

[17]  J. Keene RNA regulons: coordination of post-transcriptional events , 2007, Nature Reviews Genetics.

[18]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[19]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[20]  G. Lu,et al.  Alternate modes of cognate RNA recognition by human PUMILIO proteins. , 2011, Structure.

[21]  Uwe Ohler,et al.  Global target mRNA specification and regulation by the RNA-binding protein ZFP36 , 2014, Genome Biology.

[22]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.

[23]  Philip J. Uren,et al.  Leveraging cross-link modification events in CLIP-seq for motif discovery , 2014, Nucleic acids research.

[24]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[25]  Quaid Morris,et al.  Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. , 2010, RNA.

[26]  Nils Blüthgen,et al.  RC3H1 post-transcriptionally regulates A20 mRNA and modulates the activity of the IKK/NF-κB pathway , 2015, Nature Communications.

[27]  T. Schedl,et al.  RNA-binding proteins. , 2006, WormBook : the online review of C. elegans biology.

[28]  Timothy R. Hughes,et al.  High-throughput characterization of protein–RNA interactions , 2014, Briefings in functional genomics.

[29]  Chris Sander,et al.  RNA targets of wild-type and mutant FET family proteins , 2011, Nature Structural &Molecular Biology.

[30]  M. Hiller,et al.  Using RNA secondary structures to guide sequence motif finding towards single-stranded regions , 2006, Nucleic acids research.

[31]  J. Ule,et al.  iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution , 2010, Nature Structural &Molecular Biology.

[32]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[33]  J. Darnell,et al.  Microarray Identification of FMRP-Associated Brain mRNAs and Altered mRNA Translational Profiles in Fragile X Syndrome , 2001, Cell.

[34]  Barrett C. Foat,et al.  Discovering structural cis-regulatory elements by modeling the behaviors of mRNAs , 2009, Molecular systems biology.

[35]  Yoosik Kim,et al.  LIN 28 A Is a Suppressor of ER-Associated Translation in Embryonic Stem Cells , 2012 .

[36]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.