Leveraging cross-link modification events in CLIP-seq for motif discovery

High-throughput protein–RNA interaction data generated by CLIP-seq has provided an unprecedented depth of access to the activities of RNA-binding proteins (RBPs), the key players in co- and post-transcriptional regulation of gene expression. Motif discovery forms part of the necessary follow-up data analysis for CLIP-seq, both to refine the exact locations of RBP binding sites, and to characterize them. The specific properties of RBP binding sites, and the CLIP-seq methods, provide additional information not usually present in the classic motif discovery problem: the binding site structure, and cross-linking induced events in reads. We show that CLIP-seq data contains clear secondary structure signals, as well as technology- and RBP-specific cross-link signals. We introduce Zagros, a motif discovery algorithm specifically designed to leverage this information and explore its impact on the quality of recovered motifs. Our results indicate that using both secondary structure and cross-link modifications can greatly improve motif discovery on CLIP-seq data. Further, the motifs we recover provide insight into the balance between sequence- and structure-specificity struck by RBP binding.

[1]  Andrew D. Smith,et al.  Site identification in high-throughput RNA-protein interaction data , 2012, Bioinform..

[2]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[3]  R. Darnell,et al.  The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo , 1997, Molecular and cellular biology.

[4]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[5]  M. Zavolan,et al.  A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins , 2011, Nature Methods.

[6]  Julian König,et al.  Direct Competition between hnRNP C and U2AF65 Protects the Transcriptome from the Exonization of Alu Elements , 2013, Cell.

[7]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[8]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[9]  Gene W. Yeo,et al.  Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43 , 2011, Nature Neuroscience.

[10]  D. Black,et al.  Structure of PTB Bound to RNA: Specific Binding and Implications for Splicing Regulation , 2005, Science.

[11]  J. Ule,et al.  iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution , 2010, Nature Structural &Molecular Biology.

[12]  Gabriele Varani,et al.  RNA is rarely at a loss for companions; as soon as RNA , 2008 .

[13]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[14]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[15]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[16]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[17]  Jernej Ule,et al.  CLIP: a method for identifying protein-RNA interaction sites in living cells. , 2005, Methods.

[18]  Julian König,et al.  Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions , 2012, Genome Biology.

[19]  R. Darnell,et al.  Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data , 2011, Nature Biotechnology.

[20]  Gene W. Yeo,et al.  Integrative genome‐wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins , 2012, Cell reports.

[21]  S. Tenenbaum,et al.  Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  N. Rajewsky,et al.  Transcriptome-wide analysis of regulatory interactions of the RNA-binding protein HuR. , 2011, Molecular cell.

[23]  Tyson A. Clark,et al.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing , 2008, Nature.

[24]  Grace X. Y. Zheng,et al.  Genome-wide identification of Ago2 binding sites from mouse embryonic stem cells with and without mature microRNAs , 2010, Nature Structural &Molecular Biology.

[25]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[26]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[27]  A. Zahler,et al.  Determination of the RNA Binding Specificity of the Heterogeneous Nuclear Ribonucleoprotein (hnRNP) H/H′/F/2H9 Family* , 2001, The Journal of Biological Chemistry.

[28]  R. Darnell,et al.  Nova Regulates GABAA Receptor γ2 Alternative Splicing via a Distal Downstream UCAU-Rich Intronic Splicing Enhancer , 2003, Molecular and Cellular Biology.

[29]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[30]  J. Valcárcel,et al.  The apoptosis-promoting factor TIA-1 is a regulator of alternative pre-mRNA splicing. , 2000, Molecular cell.

[31]  Gene W. Yeo,et al.  An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells , 2009, Nature Structural &Molecular Biology.

[32]  David Tollervey,et al.  Identification of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and high-throughput analysis of cDNAs , 2009, Proceedings of the National Academy of Sciences.

[33]  B. Gregory,et al.  PRMD: an integrated database for plant RNA modifications , 2012, Plant Cell.

[34]  Renato Paro,et al.  Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data , 2012, Nucleic acids research.

[35]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[36]  J. Ule,et al.  Characterizing the RNA targets and position-dependent splicing regulation by TDP-43. , 2011, Nature neuroscience.

[37]  Gene W. Yeo,et al.  Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. , 2009, Molecular cell.

[38]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[39]  Chaolin Zhang,et al.  Prediction of clustered RNA-binding protein motif sites in the mammalian genome , 2013, Nucleic acids research.

[40]  Michael Briese,et al.  iCLIP Predicts the Dual Splicing Effects of TIA-RNA Interactions , 2010, PLoS biology.

[41]  M. Kiebler,et al.  Faculty Opinions recommendation of Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. , 2009 .

[42]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.

[43]  M. Gorospe,et al.  Distinct binding properties of TIAR RRMs and linker region , 2013, RNA biology.

[44]  Matthias Mann,et al.  Quantitative mass spectrometry and PAR-CLIP to identify RNA-protein interactions , 2012, Nucleic acids research.

[45]  P. Kuo,et al.  The crystal structure of TDP-43 RRM1-DNA complex reveals the specific recognition for UG- and TG-rich nucleic acids , 2014, Nucleic acids research.

[46]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[47]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[48]  J. Keene,et al.  Advancing the functional utility of PAR-CLIP by quantifying background binding to mRNAs and lncRNAs , 2014, Genome Biology.

[49]  Kai-Wei Chang,et al.  RNA-binding proteins in human genetic disease. , 2008, Trends in genetics : TIG.

[50]  D. Licatalosi,et al.  Integrative Modeling Defines the Nova Splicing-Regulatory Network and Its Combinatorial Controls , 2010, Science.

[51]  Mihaela Zavolan,et al.  Comparative Analysis of mRNA Targets for Human PUF-Family Proteins Suggests Extensive Interaction with the miRNA Regulatory System , 2008, PloS one.

[52]  Uwe Ohler,et al.  Integrative regulatory mapping indicates that the RNA-binding protein HuR couples pre-mRNA processing and mRNA stability. , 2011, Molecular cell.

[53]  Quaid Morris,et al.  Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. , 2010, RNA.

[54]  M. Moore From Birth to Death: The Complex Lives of Eukaryotic mRNAs , 2005, Science.

[55]  S. Richard,et al.  Target RNA motif and target mRNAs of the Quaking STAR protein , 2005, Nature Structural &Molecular Biology.

[56]  Jianhua Ruan,et al.  Genomic Analyses of the RNA-binding Protein Hu Antigen R (HuR) Identify a Complex Network of Target Genes and Novel Characteristics of Its Binding Sites* , 2011, The Journal of Biological Chemistry.

[57]  Eric T. Wang,et al.  Transcriptome-wide Regulation of Pre-mRNA Splicing and mRNA Localization by Muscleblind Proteins , 2012, Cell.