RNA-binding residues in sequence space: Conservation and interaction patterns

RNA-binding proteins (RBPs) perform fundamental and diverse functions within the cell. Approximately 15% of proteins sequences are annotated as RNA-binding, but with a significant number of proteins without functional annotation, many RBPs are yet to be identified. A percentage of uncharacterised proteins can be annotated by transferring functional information from proteins sharing significant sequence homology. However, genomes contain a significant number of orphan open reading frames (ORFs) that do not share significant sequence similarity to other ORFs, but correspond to functional proteins. Hence methods for protein function annotation that go beyond sequence homology are essential. One method of annotation is the identification of ligands that bind to proteins, through the characterisation of binding site residues. In the current work RNA-binding residues (RBRs) are characterised in terms of their evolutionary conservation and the patterns they form in sequence space. The potential for such characteristics to be used to identify RBPs from sequence is then evaluated. In the current work the conservation of residues in 261 RBPs is compared for (a) RBRs vs. non-RBRs surface residues, and for (b) specific and non-specific RBRs. The analysis shows that RBRs are more conserved than other surface residues, and RBRs hydrogen-bonded to the RNA backbone are more conserved than those making hydrogen bonds to RNA bases. This observed conservation of RBRs was then used to inform the construction of RBR sequence patterns from known protein-RNA structures. A series of RBR patterns were generated for a case study protein aspartyl-tRNA synthetase bound to tRNA; and used to differentiate between RNA-binding and non-RNA-binding protein sequences. Six sequence patterns performed with high precision values of >80% and recall values 7 times that of an homology search. When the method was expanded to the complete dataset of 261 proteins, many patterns were of poor predictive value, as they had not been manipulated on a family-specific basis. However, two patterns with precision values > or = 85% were used to make function predictions for a set of hypothetical proteins. This revealed a number of potential RBPs that require experimental verification.

[1]  Chris P. Ponting,et al.  Issues in Predicting Protein Function From Sequence , 2001, Briefings Bioinform..

[2]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  Baldomero Oliva,et al.  Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships , 2005, Bioinform..

[5]  J. Thornton,et al.  Satisfying hydrogen bonding potential in proteins. , 1994, Journal of molecular biology.

[6]  Christian J. A. Sigrist,et al.  Nucleic Acids Research Advance Access published November 14, 2007 The 20 years of PROSITE , 2007 .

[7]  V. Ramakrishnan,et al.  Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics , 2000, Nature.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[10]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[11]  D. Moras,et al.  Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp) , 1991, Science.

[12]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[13]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[14]  Ana C. Messias,et al.  Structural Basis for Recognition of the Intron Branch Site RNA by Splicing Factor 1 , 2001, Science.

[15]  Daniel Fischer,et al.  Structural biology sheds light on the puzzle of genomic ORFans. , 2004, Journal of molecular biology.

[16]  J. Janin,et al.  Dissecting protein–RNA recognition sites , 2008, Nucleic acids research.

[17]  D. Higgins,et al.  Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[18]  A. Milzani,et al.  Computer-assisted mathematical analysis of sigmoid biological events , 1986, Comput. Appl. Biosci..

[19]  Dinesh Gupta,et al.  Distribution of proline-rich (PxxP) motifs in distinct proteomes: functional and therapeutic implications for malaria and tuberculosis. , 2004, Protein engineering, design & selection : PEDS.

[20]  Gabriele Varani,et al.  RNA is rarely at a loss for companions; as soon as RNA , 2008 .

[21]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[22]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[23]  Cheng-Yan Kao,et al.  Evolutionary conservation of DNA-contact residues in DNA-binding domains , 2007, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007).

[24]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[25]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[26]  E Westhof,et al.  Statistical analysis of atomic contacts at RNA–protein interfaces , 2001, Journal of molecular recognition : JMR.

[27]  Adam Godzik,et al.  New avenues in protein function prediction , 2006, Protein science : a publication of the Protein Society.

[28]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[29]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[30]  Kyungsook Han,et al.  Discovering the interaction propensities of amino acids and nucleotides from protein-RNA complexes. , 2003, Molecules and cells.

[31]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[32]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[33]  Daniel Herschlag,et al.  Diverse RNA-Binding Proteins Interact with Functionally Related Sets of RNAs, Suggesting an Extensive Regulatory System , 2008, PLoS biology.