Inferring RNA sequence preferences for poorly studied RNA-binding proteins based on co-evolution

BackgroundCharacterizing the binding preference of RNA-binding proteins (RBP) is essential for us to understand the interaction between an RBP and its RNA targets, and to decipher the mechanism of post-transcriptional regulation. Experimental methods have been used to generate protein-RNA binding data for a number of RBPs in vivo and in vitro. Utilizing the binding data, a couple of computational methods have been developed to detect the RNA sequence or structure preferences of the RBPs. However, the majority of RBPs have not yet been experimentally characterized and lack RNA binding data. For these poorly studied RBPs, the identification of their binding preferences cannot be performed by most existing computational methods because the experimental binding data are prerequisite to these methods.ResultsHere we propose a new method based on co-evolution to predict the sequence preferences for the poorly studied RBPs, waiving the requirement of their binding data. First, we demonstrate the co-evolutionary relationship between RBPs and their RNA partners. We then present a K-nearest neighbors (KNN) based algorithm to infer the sequence preference of an RBP using only the preference information from its homologous RBPs. By benchmarking against several in vitro and in vivo datasets, our proposed method outperforms the existing alternative which uses the closest neighbor’s preference on all the datasets. Moreover, it shows comparable performance with two state-of-the-art methods that require the presence of the experimental binding data. Finally, we demonstrate the usage of this method to infer sequence preferences for novel proteins which have no binding preference information available.ConclusionFor a poorly studied RBP, the current methods used to determine its binding preference need experimental data, which is expensive and time consuming. Therefore, determining RBP’s preference is not practical in many situations. This study provides an economic solution to infer the sequence preference of such protein based on the co-evolution. The source codes and related datasets are available at https://github.com/syang11/KNN.

[1]  Robert Giegerich,et al.  The RNA shapes studio , 2014, Bioinform..

[2]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[3]  E. Jankowsky,et al.  Specificity and nonspecificity in RNA–protein interactions , 2015, Nature Reviews Molecular Cell Biology.

[4]  Michael Q. Zhang,et al.  Design and bioinformatics analysis of genome-wide CLIP experiments , 2015, Nucleic acids research.

[5]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[6]  Panayiotis V. Benos,et al.  DNA Familial Binding Profiles Made Easy: Comparison of Various Motif Alignment and Clustering Strategies , 2007, PLoS Comput. Biol..

[7]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[8]  P. Avner,et al.  Quantitative predictions of protein interactions with long noncoding RNAs , 2016, Nature Methods.

[9]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[10]  M. Hiller,et al.  Using RNA secondary structures to guide sequence motif finding towards single-stranded regions , 2006, Nucleic acids research.

[11]  J. Ule,et al.  Protein–RNA interactions: new genomic technologies and perspectives , 2012, Nature Reviews Genetics.

[12]  Bonnie Berger,et al.  RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data , 2016, Bioinform..

[13]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[14]  Panayiotis V. Benos,et al.  Inferring protein-DNA dependencies using motif alignments and mutual information , 2007, ISMB/ECCB.

[15]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[16]  Erik Dassi,et al.  AURA 2 , 2014, Translation.

[17]  Ivo L Hofacker,et al.  Energy-directed RNA structure prediction. , 2014, Methods in molecular biology.

[18]  Shu Yang,et al.  Correlated evolution of transcription factors and their binding sites , 2011, Bioinform..

[19]  Peter F. Stadler,et al.  Local RNA base pairing probabilities in large sequences , 2006, Bioinform..

[20]  Julie L. Yang,et al.  Affinity regression predicts the recognition code of nucleic acid binding proteins , 2015, Nature Biotechnology.

[21]  Barrett C. Foat,et al.  Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[23]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[24]  Michael Q. Zhang,et al.  Similarity of position frequency matrices for transcription factor binding sites , 2005, Bioinform..

[25]  Howard Y. Chang,et al.  Structural imprints in vivo decode RNA regulatory mechanisms , 2015, Nature.

[26]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[27]  Lourdes Peña Castillo,et al.  Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins , 2009, Nature Biotechnology.

[28]  Christoph Dieterich,et al.  Computational biology of RNA interactions , 2013, Wiley interdisciplinary reviews. RNA.

[29]  Q. Morris,et al.  Finding the target sites of RNA-binding proteins , 2013, Wiley interdisciplinary reviews. RNA.

[30]  Jan Gorodkin,et al.  RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods , 2014, Methods in Molecular Biology.

[31]  Quaid Morris,et al.  RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins , 2010, PLoS Comput. Biol..

[32]  Quaid Morris,et al.  Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. , 2010, RNA.

[33]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.