Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information

The identification of microRNA (miRNA)-binding protein residues significantly impacts several research areas, including gene regulation and expression. We propose a method, PmiRBR, which combines a novel hybrid feature with the Laplacian support vector machine (LapSVM) algorithm to predict miRNA-binding residues in protein sequences. The hybrid feature is constituted by secondary structure, conservation scores, and a novel feature, which includes evolutionary information combined with the physicochemical properties of amino acids. Performance comparisons of the various features indicate that our novel feature contributes the most to prediction improvement. Our results demonstrate that PmiRBR can achieve 85.96% overall accuracy, with 43.89% sensitivity and 90.56% specificity. PmiRBR significantly outperforms other approaches at miRNA-binding residue prediction.

[1]  Liangjiang Wang,et al.  BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences , 2006, Nucleic Acids Res..

[2]  Vasant G Honavar,et al.  Prediction of RNA binding sites in proteins from amino acid sequence. , 2006, RNA.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Yong Shi,et al.  Successive Overrelaxation for Laplacian Support Vector Machine , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Danail Bonchev,et al.  The Overall Wiener Index-A New Tool for Characterization of Molecular Topology , 2001, J. Chem. Inf. Comput. Sci..

[6]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[7]  J. Lü,et al.  The roles of microRNAs and protein components of the microRNA pathway in lung development and diseases. , 2015, American journal of respiratory cell and molecular biology.

[8]  Liangjiang Wang,et al.  Prediction of DNA-binding residues from protein sequence information using random forests , 2009, BMC Genomics.

[9]  Zhi-Hua Zhou,et al.  Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[12]  Yen-Jen Oyang,et al.  DNA-binding residues and binding mode prediction with binding-mechanism concerned models , 2009, BMC Genomics.

[13]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[14]  Xin Ma,et al.  Prediction of RNA‐binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature , 2011, Proteins.

[15]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[16]  Xiao Sun,et al.  Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  R. Graham,et al.  Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry , 2008, Nucleic acids research.

[18]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[19]  Taous Khan,et al.  MicroRNA and diseases: therapeutic potential as new generation of drugs. , 2014, Biochimie.

[20]  U. Dietrich,et al.  Application of the EIIP/ISM bioinformatics concept in development of new drugs. , 2007, Current medicinal chemistry.

[21]  Eran Segal,et al.  A Feature-Based Approach to Modeling Protein–DNA Interactions , 2007, RECOMB.

[22]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[23]  Jiang Wu,et al.  A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis , 2009, Interdisciplinary Sciences: Computational Life Sciences.

[24]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[25]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[26]  N. Saunders,et al.  The nuclear envelope can control gene expression and cell cycle progression via miRNA regulation , 2010, Cell cycle.

[27]  Peng Jiang,et al.  RISP: A web-based server for prediction of RNA-binding sites in proteins , 2008, Comput. Methods Programs Biomed..

[28]  S. Ahmed,et al.  MicroRNA, a new paradigm for understanding immunoregulation, inflammation, and autoimmune diseases. , 2011, Translational research : the journal of laboratory and clinical medicine.

[29]  T. Rana,et al.  Illuminating the silence: understanding the structure and function of small RNAs , 2007, Nature Reviews Molecular Cell Biology.

[30]  M. Rothenberg,et al.  Diagnostic, functional, and therapeutic roles of microRNA in allergic diseases. , 2013, The Journal of allergy and clinical immunology.