Classifying RNA-Binding Proteins Based on Electrostatic Properties

Protein structure can provide new insight into the biological function of a protein and can enable the design of better experiments to learn its biological roles. Moreover, deciphering the interactions of a protein with other molecules can contribute to the understanding of the protein's function within cellular processes. In this study, we apply a machine learning approach for classifying RNA-binding proteins based on their three-dimensional structures. The method is based on characterizing unique properties of electrostatic patches on the protein surface. Using an ensemble of general protein features and specific properties extracted from the electrostatic patches, we have trained a support vector machine (SVM) to distinguish RNA-binding proteins from other positively charged proteins that do not bind nucleic acids. Specifically, the method was applied on proteins possessing the RNA recognition motif (RRM) and successfully classified RNA-binding proteins from RRM domains involved in protein–protein interactions. Overall the method achieves 88% accuracy in classifying RNA-binding proteins, yet it cannot distinguish RNA from DNA binding proteins. Nevertheless, by applying a multiclass SVM approach we were able to classify the RNA-binding proteins based on their RNA targets, specifically, whether they bind a ribosomal RNA (rRNA), a transfer RNA (tRNA), or messenger RNA (mRNA). Finally, we present here an innovative approach that does not rely on sequence or structural homology and could be applied to identify novel RNA-binding proteins with unique folds and/or binding motifs.

[1]  L. Penalva,et al.  Post-Transcription Meets Post-Genomic: The Saga of RNA Binding Proteins in a New Era , 2006, RNA biology.

[2]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[3]  Tal Pupko,et al.  In silico identification of functional regions in proteins , 2005, ISMB.

[4]  Janet M Thornton,et al.  Using structural motif templates to identify proteins with DNA binding function. , 2003, Nucleic acids research.

[5]  Yixue Li,et al.  Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. , 2006, Journal of theoretical biology.

[6]  Janet M Thornton,et al.  Identifying DNA-binding proteins using structural motifs and the electrostatic potential. , 2004, Nucleic acids research.

[7]  Akinori Sarai,et al.  Moment-based prediction of DNA-binding proteins. , 2004, Journal of molecular biology.

[8]  Jorja G. Henikoff,et al.  Protein Family Databases , 2001 .

[9]  Yael Mandel-Gutfreund,et al.  Exploring functional relationships between components of the gene expression machinery , 2005, Nature Structural &Molecular Biology.

[10]  M. Safro,et al.  Electrostatic potential of aminoacyl-tRNA synthetase navigates tRNA on its pathway to the binding site. , 2005, Journal of molecular biology.

[11]  Yoshikazu Nakamura,et al.  Making sense of mimic in translation termination. , 2003, Trends in biochemical sciences.

[12]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[13]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[14]  Michael Sattler,et al.  U2AF-homology motif interactions are required for alternative splicing regulation by SPF45 , 2007, Nature Structural &Molecular Biology.

[15]  Jae-Hyung Lee,et al.  RNABindR: a server for analyzing and predicting RNA-binding sites in proteins , 2007, Nucleic Acids Res..

[16]  M. Summers,et al.  Protein–RNA recognition , 1998, Biopolymers.

[17]  H. Margalit,et al.  Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles. , 1995, Journal of molecular biology.

[18]  Xiaojing Yang,et al.  Crystal structures of restrictocin–inhibitor complexes with implications for RNA recognition and base flipping , 2001, Nature Structural Biology.

[19]  J. Bardwell,et al.  Structure of Hsp15 reveals a novel RNA‐binding motif , 2000, The EMBO journal.

[20]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[21]  N. Bhardwaj,et al.  Kernel-based machine learning protocol for predicting DNA-binding proteins , 2005, Nucleic acids research.

[22]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[23]  Y. Hargous,et al.  Molecular basis of RNA recognition and TAP binding by the SR proteins SRp20 and 9G8 , 2006, The EMBO journal.

[24]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[25]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[26]  Tariq M Rana,et al.  RNA helicase A interacts with RISC in human cells and functions in RISC loading. , 2007, Molecular cell.

[27]  Poul Nissen,et al.  The social life of ribosomal proteins , 2005, The FEBS journal.

[28]  Sunghoon Kim,et al.  Noncanonical Function of Glutamyl-Prolyl-tRNA Synthetase Gene-Specific Silencing of Translation , 2004, Cell.

[29]  S. Jones,et al.  Protein-RNA interactions: a structural analysis. , 2001, Nucleic acids research.

[30]  Yael Mandel-Gutfreund,et al.  Patch Finder Plus (PFplus): A web server for extracting and displaying positive electrostatic patches on protein surfaces , 2007, Nucleic Acids Res..

[31]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[32]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[33]  C. Dominguez,et al.  The RNA recognition motif, a plastic RNA‐binding platform to regulate post‐transcriptional gene expression , 2005, The FEBS journal.

[34]  Jeffrey Skolnick,et al.  Efficient prediction of nucleic acid binding function from low-resolution protein structures. , 2006, Journal of molecular biology.

[35]  D. Moras,et al.  Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp) , 1991, Science.

[36]  Yu Zong Chen,et al.  Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. , 2004, RNA.

[37]  Michael Sattler,et al.  Novel modes of protein-RNA recognition in the RNAi pathway. , 2005, Current opinion in structural biology.

[38]  Jaime Prilusky,et al.  A server and database for dipole moments of proteins , 2007, Environmental health perspectives.

[39]  J. Mattick The Functional Genomics of Noncoding RNA , 2005, Science.

[40]  Y. Xing,et al.  Stabilization of a ribosomal RNA tertiary structure by ribosomal protein L11. , 1995, Journal of molecular biology.

[41]  R. Graham,et al.  Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry , 2008, Nucleic acids research.

[42]  Elisa Izaurralde,et al.  Molecular insights into the interaction of PYM with the Mago–Y14 core of the exon junction complex , 2004, EMBO reports.

[43]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[44]  J. Keene RNA regulons: coordination of post-transcriptional events , 2007, Nature Reviews Genetics.

[45]  V. Géli,et al.  The multiple faces of Set1. , 2006, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[46]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[47]  Gabriele Varani,et al.  Protein families and RNA recognition , 2005, The FEBS journal.

[48]  D. Draper,et al.  Protein-RNA recognition. , 1995, Annual review of biochemistry.

[49]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[50]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[51]  Gabriele Varani,et al.  RNA is rarely at a loss for companions; as soon as RNA , 2008 .

[52]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[53]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[54]  J. Janin,et al.  Revisiting the Voronoi description of protein–protein interfaces , 2006, Protein science : a publication of the Protein Society.

[55]  A. Tzakos,et al.  Structure of eIF3b RNA Recognition Motif and Its Interaction with eIF3j , 2007, Journal of Biological Chemistry.

[56]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[57]  D. Barford,et al.  Argonaute: A scaffold for the function of short regulatory RNAs. , 2006, Trends in biochemical sciences.

[58]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[59]  L. H. Schulman,et al.  Recognition of tRNAs by aminoacyl-tRNA synthetases. , 1991, Progress in nucleic acid research and molecular biology.

[60]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[61]  Seungwoo Hwang,et al.  Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins , 2006, Proteins.

[62]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[63]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.