Identifying RNA-binding residues based on evolutionary conserved structural and energetic features

Increasing numbers of protein structures are solved each year, but many of these structures belong to proteins whose sequences are homologous to sequences in the Protein Data Bank. Nevertheless, the structures of homologous proteins belonging to the same family contain useful information because functionally important residues are expected to preserve physico-chemical, structural and energetic features. This information forms the basis of our method, which detects RNA-binding residues of a given RNA-binding protein as those residues that preserve physico-chemical, structural and energetic features in its homologs. Tests on 81 RNA-bound and 35 RNA-free protein structures showed that our method yields a higher fraction of true RNA-binding residues (higher precision) than two structure-based and two sequence-based machine-learning methods. Because the method requires no training data set and has no parameters, its precision does not degrade when applied to ‘novel’ protein sequences unlike methods that are parameterized for a given training data set. It was used to predict the ‘unknown’ RNA-binding residues in the C-terminal RNA-binding domain of human CPEB3. The two predicted residues, F430 and F474, were experimentally verified to bind RNA, in particular F430, whose mutation to alanine or asparagine nearly abolished RNA binding. The method has been implemented in a webserver called DR_bind1, which is freely available with no login requirement at http://drbind.limlab.ibms.sinica.edu.tw.

[1]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[2]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[3]  J. Thornton,et al.  Satisfying hydrogen bonding potential in proteins. , 1994, Journal of molecular biology.

[4]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[5]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[6]  S. Tsuji,et al.  Identification of the spinocerebellar ataxia type 2 gene using a direct identification of repeat expansion and cloning technique, DIRECT , 1996, Nature Genetics.

[7]  A. Munnich,et al.  The RNA-binding properties of SMN: deletion analysis of the zebrafish orthologue defines domains conserved in evolution. , 1999, Human molecular genetics.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  Wei Zhang,et al.  A point‐charge force field for molecular mechanics simulations of proteins based on condensed‐phase quantum mechanical calculations , 2003, J. Comput. Chem..

[10]  Thomas Tuschl,et al.  Functional genomics: RNA sets the standard , 2003, Nature.

[11]  Gary Ruvkun,et al.  Genome-wide RNAi analysis of Caenorhabditis elegans fat regulatory genes , 2003, Nature.

[12]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[13]  Tal Pupko,et al.  Structural Genomics , 2005 .

[14]  Matthias Keil,et al.  Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network , 2004, J. Comput. Chem..

[15]  Satoru Miyano,et al.  A neural network method for identification of RNA-interacting residues in protein. , 2004, Genome informatics. International Conference on Genome Informatics.

[16]  Frances M. G. Pearl,et al.  The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[17]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[18]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[19]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[20]  J. Hentz,et al.  The MUC1 Cytoplasmic Tail and Tandem Repeat Domains Contribute to Mammary Oncogenesis in FVB Mice , 2008, Breast cancer : basic and clinical research.

[21]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[22]  Yi-shuian Huang,et al.  CPEB3 and CPEB4 in neurons: analysis of RNA‐binding specificity and translational control of AMPA receptor GluR2 mRNA , 2006, The EMBO journal.

[23]  M. Strong,et al.  TDP43 is a human low molecular weight neurofilament (hNFL) mRNA-binding protein , 2007, Molecular and Cellular Neuroscience.

[24]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[25]  Carmay Lim,et al.  Predicting DNA‐binding amino acid residues from electrostatic stabilization upon mutation to Asp/Glu and evolutionary conservation , 2007, Proteins.

[26]  Y. Wang,et al.  PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles , 2008, Amino Acids.

[27]  Yao Chi Chen,et al.  Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry , 2008, Nucleic acids research.

[28]  Carmay Lim,et al.  Common physical basis of macromolecule-binding sites in proteins , 2008, Nucleic acids research.

[29]  Peng Jiang,et al.  RISP: A web-based server for prediction of RNA-binding sites in proteins , 2008, Comput. Methods Programs Biomed..

[30]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[31]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[32]  Zheng Yuan,et al.  Exploiting structural and topological information to improve prediction of RNA-protein binding sites , 2009, BMC Bioinformatics.

[33]  Nir Ben-Tal,et al.  The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures , 2008, Nucleic Acids Res..

[34]  Jieping Ye,et al.  Multiple structure alignment and consensus identification for proteins , 2006, BMC Bioinformatics.

[35]  Lili Wan,et al.  RNA and Disease , 2009, Cell.

[36]  Yanga Byun,et al.  Predicting RNA-binding sites in proteins using the interaction propensity of amino acid triplets. , 2010, Protein and peptide letters.

[37]  Vasant Honavar,et al.  Struct-NB: predicting protein-RNA binding sites using structural features , 2010, Int. J. Data Min. Bioinform..

[38]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[39]  L. Perez-Cano,et al.  Optimal protein‐RNA area, OPRA: A propensity‐based method to identify RNA‐binding sites on proteins , 2010, Proteins.

[40]  Yu-Feng Huang,et al.  Predicting RNA-binding residues from evolutionary information and sequence conservation , 2010, BMC Genomics.

[41]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[42]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[43]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..

[44]  Xin Ma,et al.  Prediction of RNA‐binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature , 2011, Proteins.

[45]  Kyungsook Han,et al.  Prediction of RNA-binding amino acids from protein and RNA sequences , 2011, BMC Bioinformatics.

[46]  Yaoqi Zhou,et al.  Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets , 2010, Nucleic acids research.

[47]  Yi-shuian Huang,et al.  NMDAR signaling facilitates the IPO5-mediated nuclear import of CPEB3 , 2012, Nucleic acids research.

[48]  H. Valadié,et al.  Local Structural Differences in Homologous Proteins: Specificities in Different SCOP Classes , 2012, PloS one.

[49]  Carmay Lim,et al.  DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry , 2012, Nucleic Acids Res..

[50]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.