Recognition of microRNA-binding sites in proteins from sequences using Laplacian Support Vector Machines with a hybrid feature

The recognition of microRNA (miRNA)-binding residues in proteins would further enhance our understanding of how miRNAs silence their target genes and some relevant biological processes. Due to the insufficient labeled examples, traditional methods such as SVMs could not work well on such problems. Thus, we propose a semi-supervised learning method, i.e., Laplacian Support Vector Machine (LapSVM) for recognizing miRNA-binding residues in proteins from sequences by making use of both labeled and unlabeled data in this article. A hybrid feature is put forward for coding instances which incorporates evolutionary information of the amino acid sequence and mutual interaction propensities in protein-miRNA complex structures. The results indicate that the LapSVM model receives good performance with a F1 score of 22.06±0.28% and an AUC (area under the ROC curve) value of 0.760±0.043. A web server called MBindR is built and freely available at http:// cbi.njupt.edu.cn/MBindR/MBindR.htm for academic usage.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  T. Rana,et al.  Illuminating the silence: understanding the structure and function of small RNAs , 2007, Nature Reviews Molecular Cell Biology.

[3]  Vasant Honavar,et al.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art , 2012, BMC Bioinformatics.

[4]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[5]  Zhi-Hua Zhou,et al.  Semi-supervised learning using label mean , 2009, ICML '09.

[6]  Yan Wang,et al.  Better prediction of the location of alpha-turns in proteins with support vector machine. , 2006, Proteins.

[7]  Š. Pospíšilová,et al.  MicroRNAs in chronic lymphocytic leukemia: from causality to associations and back , 2012, Expert review of hematology.

[8]  Zhi-Hua Zhou,et al.  Distributional Features for Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[9]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[10]  Luis Gómez-Chova,et al.  Semisupervised Image Classification With Laplacian Support Vector Machines , 2008, IEEE Geoscience and Remote Sensing Letters.

[11]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[12]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[13]  Lipo Wang,et al.  Data Mining With Computational Intelligence , 2006, IEEE Transactions on Neural Networks.

[14]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[15]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[16]  Haruki Nakamura,et al.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences , 2010, Nucleic Acids Res..

[17]  Chunru Wan,et al.  Classification using support vector machines with graded resolution , 2005, 2005 IEEE International Conference on Granular Computing.

[18]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[19]  Yan Wang,et al.  Better prediction of the location of α‐turns in proteins with support vector machine , 2006 .

[20]  David Burshtein,et al.  Support Vector Machine Training for Improved Hidden Markov Modeling , 2008, IEEE Transactions on Signal Processing.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Xiao Sun,et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature , 2008, Bioinform..

[23]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[24]  Xin Ma,et al.  Prediction of RNA‐binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature , 2011, Proteins.

[25]  Vojislav Kecman,et al.  Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models , 2001 .

[26]  N. Rajewsky,et al.  The evolution of gene regulation by transcription factors and microRNAs , 2007, Nature Reviews Genetics.

[27]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[28]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[29]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[30]  E. Olson,et al.  Pervasive roles of microRNAs in cardiovascular biology , 2011, Nature.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Marimuthu Palaniswami,et al.  Support Vector Machines for Automated Recognition of Obstructive Sleep Apnea Syndrome From ECG Recordings , 2009, IEEE Transactions on Information Technology in Biomedicine.

[33]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[34]  Massimiliano Pontil,et al.  Support Vector Machines: Theory and Applications , 2001, Machine Learning and Its Applications.

[35]  Jiang Wu,et al.  A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis , 2009, Interdisciplinary Sciences: Computational Life Sciences.

[36]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[37]  Zhi-Hua Zhou When semi-supervised learning meets ensemble learning , 2011 .