Identification of high-confidence RNA regulatory elements by combinatorial classification of RNA–protein binding sites

Crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled researchers to characterize transcriptome-wide binding sites of RNA-binding protein (RBP) with high resolution. We apply a soft-clustering method, RBPgroup, to various CLIP-seq datasets to group together RBPs that specifically bind the same RNA sites. Such combinatorial clustering of RBPs helps interpret CLIP-seq data and suggests functional RNA regulatory elements. Furthermore, we validate two RBP–RBP interactions in cell lines. Our approach links proteins and RNA motifs known to possess similar biochemical and cellular properties and can, when used in conjunction with additional experimental data, identify high-confidence RBP groups and their associated RNA regulatory elements.

[1]  R. Elkon,et al.  Alternative cleavage and polyadenylation: extent, regulation and function , 2013, Nature Reviews Genetics.

[2]  Eugenia G. Giannopoulou,et al.  Inferring chromatin-bound protein complexes from genome-wide binding assays , 2013, Genome research.

[3]  Edward L Huttlin,et al.  Proteomic analysis of cap-dependent translation identifies LARP1 as a key regulator of 5′TOP mRNA translation , 2014, Genes & development.

[4]  Chaolin Zhang,et al.  Loss of MBNL leads to disruption of developmentally regulated alternative polyadenylation in RNA-mediated disease. , 2014, Molecular cell.

[5]  Nikolaus Rajewsky,et al.  Competition between target sites of regulators shapes post-transcriptional gene regulation , 2014, Nature Reviews Genetics.

[6]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[7]  Min Xu,et al.  Automated multidimensional phenotypic profiling using large public microarray repositories , 2009, Proceedings of the National Academy of Sciences.

[8]  M. Stratton,et al.  Deciphering Signatures of Mutational Processes Operative in Human Cancer , 2013, Cell reports.

[9]  Scott B. Dewell,et al.  Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP , 2010, Cell.

[10]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[11]  Kirk M Brown,et al.  A mechanism for the regulation of pre-mRNA 3' processing by human cleavage factor Im. , 2003, Molecular cell.

[12]  A. Mele,et al.  Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis , 2014, Nature Protocols.

[13]  K. Neugebauer,et al.  How cells get the message: dynamic assembly and function of mRNA–protein complexes , 2013, Nature Reviews Genetics.

[14]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.

[15]  Marinka Zitnik,et al.  Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins , 2016, Bioinform..

[16]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. Ares,et al.  Context-dependent control of alternative splicing by RNA-binding proteins , 2014, Nature Reviews Genetics.

[18]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[19]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[20]  Brendan J. Frey,et al.  Deciphering the splicing code , 2010, Nature.

[21]  S. Gerstberger,et al.  A census of human RNA-binding proteins , 2014, Nature Reviews Genetics.

[22]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[23]  Chaolin Zhang,et al.  Prediction of clustered RNA-binding protein motif sites in the mammalian genome , 2013, Nucleic acids research.

[24]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[25]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[26]  Gene W. Yeo,et al.  Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges , 2013, Nature Structural &Molecular Biology.

[27]  Rolf Backofen,et al.  Computational analysis of CLIP-seq data. , 2017, Methods.

[28]  Tyson A. Clark,et al.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing , 2008, Nature.

[29]  J. Manley,et al.  Alternative pre-mRNA splicing regulation in cancer: pathways and programs unhinged. , 2010, Genes & development.

[30]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[31]  Hui Zhou,et al.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data , 2013, Nucleic Acids Res..

[32]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[33]  Bin Tian,et al.  Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing. , 2007, Genome research.

[34]  Sylvie Doublié,et al.  Crystal structure of a human cleavage factor CFI(m)25/CFI(m)68/RNA complex provides an insight into poly(A) site recognition and RNA looping. , 2011, Structure.

[35]  Jesse R. Dixon,et al.  Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells , 2013, Proceedings of the National Academy of Sciences.

[36]  Johannes Söding,et al.  Transcriptome maps of mRNP biogenesis factors define pre-mRNA recognition. , 2014, Molecular cell.

[37]  L. Tong,et al.  Protein factors in pre-mRNA 3′-end processing , 2008, Cellular and Molecular Life Sciences.

[38]  N. Proudfoot Ending the message: poly(A) signals then and now. , 2011, Genes & development.

[39]  Ariel S. Schwartz,et al.  An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man , 2010, Cell.

[40]  J. Ule,et al.  iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution , 2010, Nature Structural &Molecular Biology.

[41]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[42]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[43]  Gary D. Bader,et al.  GeneMANIA Prediction Server 2013 Update , 2013, Nucleic Acids Res..

[44]  Gene W. Yeo,et al.  Integrative genome‐wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins , 2012, Cell reports.

[45]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[46]  Mihaela Zavolan,et al.  Insights into snoRNA biogenesis and processing from PAR-CLIP of snoRNA core proteins and small RNA sequencing , 2013, Genome Biology.

[47]  Boqin Hu,et al.  POSTAR: a platform for exploring post-transcriptional regulation coordinated by RNA-binding proteins , 2016, Nucleic Acids Res..

[48]  Boqin Hu,et al.  CLIPdb: a CLIP-seq database for protein-RNA interactions , 2015, BMC Genomics.

[49]  Gene W. Yeo,et al.  Robust transcriptome-wide discovery of RNA binding protein binding sites with enhanced CLIP (eCLIP) , 2016, Nature Methods.

[50]  G. Meister Argonaute proteins: functional insights and emerging roles , 2013, Nature Reviews Genetics.

[51]  Andrew D. Smith,et al.  Site identification in high-throughput RNA-protein interaction data , 2012, Bioinform..

[52]  Yi Zhang,et al.  Mechanisms for U2AF to define 3′ splice sites and regulate alternative splicing in the human genome , 2014, Nature Structural &Molecular Biology.

[53]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[54]  M. Zavolan,et al.  A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins , 2011, Nature Methods.

[55]  K. Thiel,et al.  Genetic Deficiency of Mtdh Gene in Mice Causes Male Infertility via Impaired Spermatogenesis and Alterations in the Expression of Small Non-coding RNAs* , 2015, The Journal of Biological Chemistry.

[56]  C. Lutz,et al.  mRNA 3′ End Processing Factors: A Phylogenetic Comparison , 2012, Comparative and functional genomics.

[57]  T. Arndt Crystal , 2019, Springer Reference Medizin.

[58]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[59]  Differential protein occupancy profiling of the mRNA transcriptome , 2014, Genome Biology.

[60]  R. Darnell,et al.  Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data , 2011, Nature Biotechnology.

[61]  David R. Kelley,et al.  Widespread RNA binding by chromatin-associated proteins , 2016, Genome Biology.

[62]  Mihaela Zavolan,et al.  Genome-wide analysis of pre-mRNA 3' end processing reveals a decisive role of human cleavage factor I in the regulation of 3' UTR length. , 2012, Cell reports.

[63]  J. Keene RNA regulons: coordination of post-transcriptional events , 2007, Nature Reviews Genetics.

[64]  Andrew D. Rouillard,et al.  The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins , 2016, Database J. Biol. Databases Curation.

[65]  E. Izaurralde,et al.  The C-terminal domains of human TNRC6A, TNRC6B, and TNRC6C silence bound transcripts independently of Argonaute proteins. , 2009, RNA.