Prediction of guide strand of microRNAs from its sequence and secondary structure

BackgroundMicroRNAs (miRNAs) are produced by the sequential processing of a long hairpin RNA transcript by Drosha and Dicer, an RNase III enzymes, and form transitory small RNA duplexes. One strand of the duplex, which incorporates into RNA-induced silencing complex (RISC) and silences the gene expression is called guide strand, or miRNA; while the other strand of duplex is degraded and called the passenger strand, or miRNA*. Predicting the guide strand of miRNA is important for better understanding the RNA interference pathways.ResultsThis paper describes support vector machine (SVM) models developed for predicting the guide strands of miRNAs. All models were trained and tested on a dataset consisting of 329 miRNA and 329 miRNA* pairs using five fold cross validation technique. Firstly, models were developed using mono-, di-, and tri-nucleotide composition of miRNA strands and achieved the highest accuracies of 0.588, 0.638 and 0.596 respectively. Secondly, models were developed using split nucleotide composition and achieved maximum accuracies of 0.553, 0.641 and 0.602 for mono-, di-, and tri-nucleotide respectively. Thirdly, models were developed using binary pattern and achieved the highest accuracy of 0.708. Furthermore, when integrating the secondary structure features with binary pattern, an accuracy of 0.719 was seen. Finally, hybrid models were developed by combining various features and achieved maximum accuracy of 0.799 with sensitivity 0.781 and specificity 0.818. Moreover, the performance of this model was tested on an independent dataset that achieved an accuracy of 0.80. In addition, we also compared the performance of our method with various siRNA-designing methods on miRNA and siRNA datasets.ConclusionIn this study, first time a method has been developed to predict guide miRNA strands, of miRNA duplex. This study demonstrates that guide and passenger strand of miRNA precursors can be distinguished using their nucleotide sequence and secondary structure. This method will be useful in understanding microRNA processing and can be implemented in RNA silencing technology to improve the biological and clinical research. A web server has been developed based on SVM models described in this study http://crdd.osdd.net:8081/RISCbinder/.

[1]  A. Konagaya,et al.  An Effective Method for Selecting siRNA Target Sequences in Mammalian Cells , 2004, Cell cycle.

[2]  Gajendra P. S. Raghava,et al.  Prediction of Polyadenylation Signals in Human DNA Sequences using Nucleotide Frequencies , 2009, Silico Biol..

[3]  T. Du,et al.  Asymmetry in the Assembly of the RNAi Enzyme Complex , 2003, Cell.

[4]  J. Yue,et al.  MicroRNA trafficking and human cancer , 2006, Cancer biology & therapy.

[5]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[6]  S. Hammond,et al.  An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells , 2000, Nature.

[7]  Gajendra P. S. Raghava,et al.  SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence , 2004, Bioinform..

[8]  Patterns of known and novel small RNAs in human cervical cancer. , 2007, Cancer research.

[9]  Baohong Zhang,et al.  Conservation and divergence of plant microRNA genes. , 2006, The Plant journal : for cell and molecular biology.

[10]  J. Krol,et al.  Structural Features of MicroRNA (miRNA) Precursors and Their Relevance to miRNA Biogenesis and Small Interfering RNA/Short Hairpin RNA Design* , 2004, Journal of Biological Chemistry.

[11]  Gajendra P. S. Raghava,et al.  A Machine Learning Based Method for the Prediction of Secretory Proteins Using Amino Acid Composition, Their Order and Similarity-Search , 2008, Silico Biol..

[12]  Gajendra P. S. Raghava,et al.  VICMpred: An SVM-based Method for the Prediction of Functional Proteins of Gram-negative Bacteria Using Amino Acid Patterns and Composition , 2006, Genom. Proteom. Bioinform..

[13]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[14]  T. Tuschl,et al.  RNA interference is mediated by 21- and 22-nucleotide RNAs. , 2001, Genes & development.

[15]  V. Ambros The functions of animal microRNAs , 2004, Nature.

[16]  Gajendra P. S. Raghava,et al.  AlgPred: prediction of allergenic proteins and mapping of IgE epitopes , 2006, Nucleic Acids Res..

[17]  G. Hannon,et al.  A complex system of small RNAs in the unicellular green alga Chlamydomonas reinhardtii. , 2007, Genes & development.

[18]  Gajendra P S Raghava,et al.  Prediction of Mitochondrial Proteins Using Support Vector Machine and Hidden Markov Model* , 2006, Journal of Biological Chemistry.

[19]  Jean-Philippe Vert,et al.  An accurate and interpretable model for siRNA efficacy prediction , 2006, BMC Bioinformatics.

[20]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[21]  J. Manola,et al.  A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens. , 2004, Nucleic acids research.

[22]  Dieter Huesken,et al.  Design of a genome-wide siRNA library using an artificial neural network , 2005, Nature Biotechnology.

[23]  N. Dean,et al.  Competition for RISC binding predicts in vitro potency of siRNA , 2006, Nucleic acids research.

[24]  G. Hutvagner,et al.  A microRNA in a Multiple-Turnover RNAi Enzyme Complex , 2002, Science.

[25]  J. M. Thomson,et al.  Argonaute2 Is the Catalytic Engine of Mammalian RNAi , 2004, Science.

[26]  Aleksey Y. Ogurtsov,et al.  Computational models with thermodynamic and composition features improve siRNA design , 2006, BMC Bioinformatics.

[27]  Mamoon Rashid,et al.  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs , 2007, BMC Bioinformatics.

[28]  P. Provost,et al.  MicroRNAs in Gene Regulation: When the Smallest Governs It All , 2006, Journal of biomedicine & biotechnology.

[29]  P. Sætrom,et al.  Comparison of approaches for rational siRNA design leading to a new efficient and transparent method , 2007, Nucleic acids research.

[30]  Dong Lin,et al.  Integrated siRNA design based on surveying of features associated with high RNAi effectiveness , 2006, BMC Bioinformatics.

[31]  T. Katoh,et al.  Specific residues at every third position of siRNA shape its efficient RNAi activity , 2007, Nucleic acids research.

[32]  T. Tuschl,et al.  On the art of identifying effective and specific siRNAs , 2006, Nature Methods.

[33]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[34]  Gajendra P. S. Raghava,et al.  Analysis and prediction of antibacterial peptides , 2007, BMC Bioinformatics.

[35]  D. Ganem,et al.  MicroRNAs and viral infection. , 2005, Molecular cell.

[36]  A. Reynolds,et al.  Rational siRNA design for RNA interference , 2004, Nature Biotechnology.

[37]  David P. Bartel,et al.  Passenger-Strand Cleavage Facilitates Assembly of siRNA into Ago2-Containing RNAi Enzyme Complexes , 2005, Cell.

[38]  S. Jayasena,et al.  Functional siRNAs and miRNAs Exhibit Strand Bias , 2003, Cell.

[39]  Terry Gaasterland,et al.  Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets , 2004, Genome Biology.

[40]  K. Ui-Tei,et al.  Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference. , 2004, Nucleic acids research.

[41]  Xiaodong Wang,et al.  Argonaute2 Cleaves the Anti-Guide Strand of siRNA during RISC Activation , 2005, Cell.

[42]  Hong Duan,et al.  The regulatory activity of microRNA* species has substantial influence on microRNA and 3′ UTR evolution , 2008, Nature Structural &Molecular Biology.

[43]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[45]  M. Amarzguioui,et al.  An algorithm for selection of functional siRNA sequences. , 2004, Biochemical and biophysical research communications.

[46]  M. Siomi,et al.  Slicer function of Drosophila Argonautes and its involvement in RISC formation. , 2005, Genes & development.

[47]  M. Ichihara,et al.  Thermodynamic instability of siRNA duplex is a prerequisite for dependable prediction of siRNA activities , 2007, Nucleic acids research.

[48]  G P S Raghava,et al.  Support vector machine based prediction of glutathione S-transferase proteins. , 2007, Protein and peptide letters.

[49]  James E Ferrell,et al.  Picking a winner: new mechanistic insights into the design of effective siRNAs. , 2004, Trends in biotechnology.

[50]  John J Rossi,et al.  Rational design and in vitro and in vivo delivery of Dicer substrate siRNA , 2006, Nature Protocols.

[51]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .