Prediction of RNA-Binding Proteins by Voting Systems

It is important to identify which proteins can interact with RNA for the purpose of protein annotation, since interactions between RNA and proteins influence the structure of the ribosome and play important roles in gene expression. This paper tries to identify proteins that can interact with RNA using voting systems. Firstly through Weka, 34 learning algorithms are chosen for investigation. Then simple majority voting system (SMVS) is used for the prediction of RNA-binding proteins, achieving average ACC (overall prediction accuracy) value of 79.72% and MCC (Matthew's correlation coefficient) value of 59.77% for the independent testing dataset. Then mRMR (minimum redundancy maximum relevance) strategy is used, which is transferred into algorithm selection. In addition, the MCC value of each classifier is assigned to be the weight of the classifier's vote. As a result, best average MCC values are attained when 22 algorithms are selected and integrated through weighted votes, which are 64.70% for the independent testing dataset, and ACC value is 82.04% at this moment.

[1]  Bing Niu,et al.  Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection. , 2009, Biochemical and biophysical research communications.

[2]  Wenjin Li,et al.  Prediction of protein structural classes using hybrid properties , 2008, Molecular Diversity.

[3]  Peng Jiang,et al.  RISP: A web-based server for prediction of RNA-binding sites in proteins , 2008, Comput. Methods Programs Biomed..

[4]  Yu-Dong Cai,et al.  Predicting N-terminal acetylation based on feature selection method. , 2008, Biochemical and biophysical research communications.

[5]  N. Go,et al.  Amino acid residue doublet propensity in the protein–RNA interface and its application to RNA interface prediction , 2006, Nucleic acids research.

[6]  Yu Zong Chen,et al.  Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. , 2004, RNA.

[7]  Lin Lu,et al.  Prediction of interaction between small molecule and enzyme using AdaBoost , 2009, Molecular Diversity.

[8]  M. Moore From Birth to Death: The Complex Lives of Eukaryotic mRNAs , 2005, Science.

[9]  Harry F Noller,et al.  RNA Structure: Reading the Ribosome , 2005, Science.

[10]  Fuad Rahman,et al.  Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations , 2002, Document Analysis Systems.

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Melissa S Jurica,et al.  Pre-mRNA splicing: awash in a sea of proteins. , 2003, Molecular cell.

[13]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[14]  Satoru Miyano,et al.  A Weighted Profile Based Method for Protein-RNA Interacting Residue Prediction , 2006, Trans. Comp. Sys. Biology.

[15]  Vasant Honavar,et al.  Identifying Interaction Sites in , 2005 .

[16]  Lin Lu,et al.  Prediction of compounds’ biological function (metabolic pathways) based on functional group composition , 2008, Molecular Diversity.

[17]  Zhenbing Zeng,et al.  Multiple classifier integration for the prediction of protein structural classes , 2009, J. Comput. Chem..

[18]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[19]  Jae-Hyung Lee,et al.  Identifying Interaction Sites in "Recalcitrant" Proteins: Predicted Protein and RNA Binding Sites in Rev Proteins of HIV-1 and EIAV Agree with Experimental Data , 2006, Pacific Symposium on Biocomputing.

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[21]  E. Freed,et al.  The cell biology of HIV-1 and other retroviruses , 2006, Retrovirology.

[22]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[23]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[24]  Y. Wang,et al.  PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles , 2008, Amino Acids.

[25]  Yixue Li,et al.  Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. , 2006, Journal of theoretical biology.

[26]  Ziliang Qian,et al.  Prediction of peptidase category based on functional domain composition. , 2008, Journal of proteome research.

[27]  Lin Lu,et al.  HIV‐1 protease cleavage site prediction based on amino acid property , 2009, J. Comput. Chem..

[28]  Yu-dong Cai,et al.  Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. , 2003, Biochimica et biophysica acta.