PHDcleav: a SVM based method for predicting human Dicer cleavage sites using sequence and secondary structure of miRNA precursors

BackgroundDicer, an RNase III enzyme, plays a vital role in the processing of pre-miRNAs for generating the miRNAs. The structural and sequence features on pre-miRNA which can facilitate position and efficiency of cleavage are not well known. A precise cleavage by Dicer is crucial because an inaccurate processing can produce miRNA with different seed regions which can alter the repertoire of target genes.ResultsIn this study, a novel method has been developed to predict Dicer cleavage sites on pre-miRNAs using Support Vector Machine. We used the dataset of experimentally validated human miRNA hairpins from miRBase, and extracted fourteen nucleotides around Dicer cleavage sites. We developed number of models using various types of features and achieved maximum accuracy of 66% using binary profile of nucleotide sequence taken from 5p arm of hairpin. The prediction performance of Dicer cleavage site improved significantly from 66% to 86% when we integrated secondary structure information. This indicates that secondary structure plays an important role in the selection of cleavage site. All models were trained and tested on 555 experimentally validated cleavage sites and evaluated using 5-fold cross validation technique. In addition, the performance was also evaluated on an independent testing dataset that achieved an accuracy of ~82%.ConclusionBased on this study, we developed a webserver PHDcleav (http://www.imtech.res.in/raghava/phdcleav/) to predict Dicer cleavage sites in pre-miRNA. This tool can be used to investigate functional consequences of genetic variations/SNPs in miRNA on Dicer cleavage site, and gene silencing. Moreover, it would also be useful in the discovery of miRNAs in human genome and design of Dicer specific pre-miRNAs for potent gene silencing.

[1]  Louis Flamand,et al.  Identification of functional microRNAs released through asymmetrical processing of HIV-1 TAR element† , 2008, Nucleic acids research.

[2]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[3]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[4]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[5]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[6]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[7]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[8]  Manoj Bhasin,et al.  Analysis and prediction of affinity of TAP binding peptides using cascade SVM , 2004, Protein science : a publication of the Protein Society.

[9]  Gajendra P. S. Raghava,et al.  AntiBP2: improved version of antibacterial peptide prediction , 2010, BMC Bioinformatics.

[10]  Noah C Welker,et al.  Dicer's helicase domain discriminates dsRNA termini to promote an altered reaction mode. , 2011, Molecular cell.

[11]  Shuhong Zhao,et al.  Discovery of Porcine microRNAs in Multiple Tissues by a Solexa Deep Sequencing Approach , 2011, PloS one.

[12]  D. Bartel,et al.  Intronic microRNA precursors that bypass Drosha processing , 2007, Nature.

[13]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[14]  D. Patel,et al.  Structural basis for overhang-specific small interfering RNA recognition by the PAZ domain , 2004, Nature.

[15]  Gajendra P.S. Raghava,et al.  RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information , 2009, Proteomics.

[16]  Jamie J. Cannone,et al.  Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[17]  William E. Salomon,et al.  Modified dsRNAs that are not processed by Dicer maintain potency and are incorporated into the RISC , 2010, Nucleic acids research.

[18]  K. Mclachlan,et al.  Development of combinatorial RNAi transgenes targeting influenza virus , 2012 .

[19]  Jianxing Song,et al.  Structure of the Arabidopsis thaliana DCL4 DUF283 domain reveals a noncanonical double-stranded RNA-binding fold for protein-protein interaction. , 2010, RNA.

[20]  E. Lai,et al.  The Mirtron Pathway Generates microRNA-Class Regulatory RNAs in Drosophila , 2007, Cell.

[21]  Caroline G. L. Lee,et al.  Single Nucleotide Polymorphisms Associated with MicroRNA Regulation , 2013, Biomolecules.

[22]  Gajendra P. S. Raghava,et al.  Analysis and prediction of antibacterial peptides , 2007, BMC Bioinformatics.

[23]  Barbara Jarzab,et al.  Common SNP in pre-miR-146a decreases mature miR expression and predisposes to papillary thyroid carcinoma , 2008, Proceedings of the National Academy of Sciences.

[24]  Javier F. Palatnik,et al.  Identification of MicroRNA Processing Determinants by Random Mutagenesis of Arabidopsis MIR172a Precursor , 2010, Current Biology.

[25]  W. Filipowicz,et al.  Role of Dicer in posttranscriptional RNA silencing. , 2008, Current topics in microbiology and immunology.

[26]  Gajendra P. S. Raghava,et al.  Prediction of guide strand of microRNAs from its sequence and secondary structure , 2009, BMC Bioinformatics.

[27]  Detlef Weigel,et al.  Structure Determinants for Accurate Processing of miR172a in Arabidopsis thaliana , 2010, Current Biology.

[28]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[29]  M. Peppelenbosch,et al.  A dynamic perspective of RNAi library development. , 2012, Trends in biotechnology.

[30]  Gajendra PS Raghava,et al.  RESEARCH ARTICLE Open Access Research article Prediction of cytochrome P450 isoform responsible , 2022 .

[31]  T. Tuschl,et al.  Mechanisms of gene silencing by double-stranded RNA , 2004, Nature.

[32]  Manoj Kumar,et al.  HIVsirDB: A Database of HIV Inhibiting siRNAs , 2011, PloS one.

[33]  Joarder Kamruzzaman,et al.  z-SVM: An SVM for Improved Classification of Imbalanced Data , 2006, Australian Conference on Artificial Intelligence.

[34]  Eugene Berezikov,et al.  Mammalian mirtron genes. , 2007, Molecular cell.

[35]  Lei Wang,et al.  A simple artificial microRNA vector based on ath-miR169d precursor from Arabidopsis , 2010, Molecular Biology Reports.

[36]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[37]  Andrew E. Bruno,et al.  miRdSNP: a database of disease-associated SNPs and microRNA target sites on 3'UTRs of human genes , 2012, BMC Genomics.

[38]  Yue Zhang,et al.  The Loop Position of shRNAs and Pre-miRNAs Is Critical for the Accuracy of Dicer Processing In Vivo , 2012, Cell.

[39]  Peng Jin,et al.  Single nucleotide polymorphism associated with mature miR-125a alters the processing of pri-miRNA. , 2007, Human molecular genetics.

[40]  Gajendra P. S. Raghava,et al.  Prediction of Polyadenylation Signals in Human DNA Sequences using Nucleotide Frequencies , 2009, Silico Biol..

[41]  Sangdun Choi,et al.  Synthetic dsRNA Dicer substrates enhance RNAi potency and efficacy , 2005, Nature Biotechnology.

[42]  Xiangji Huang,et al.  Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles , 2006, PAKDD.

[43]  Pål Sætrom,et al.  A role for the Dicer helicase domain in the processing of thermodynamically unstable hairpin RNAs , 2008, Nucleic acids research.

[44]  Eric Westhof,et al.  Single Processing Center Models for Human Dicer and Bacterial RNase III , 2004, Cell.

[45]  R. Shiekhattar,et al.  TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing , 2005, Nature.

[46]  John J Rossi,et al.  SNPs in human miRNA genes affect biogenesis and function. , 2009, RNA.

[47]  Firoz Ahmed,et al.  Designing of Highly Effective Complementary and Mismatch siRNAs for Silencing a Gene , 2011, PloS one.

[48]  Ammar S Naqvi,et al.  Deep annotation of Drosophila melanogaster microRNAs yields insights into their processing, modification, and emergence. , 2011, Genome research.

[49]  BMC Bioinformatics , 2005 .

[50]  Ola R. Snøve,et al.  Reliable prediction of Drosha processing sites improves microRNA gene prediction. , 2007, Bioinformatics.

[51]  Gajendra P. S. Raghava,et al.  ProPred1: Prediction of Promiscuous MHC Class-I Binding Sites , 2003, Bioinform..

[52]  Praveen Sethupathy,et al.  MicroRNA target site polymorphisms and human disease. , 2008, Trends in genetics : TIG.

[53]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[54]  Arijit Mukhopadhyay,et al.  miRvar: A comprehensive database for genomic variations in microRNAs , 2011, Human mutation.

[55]  Gajendra P. S. Raghava,et al.  BMC Bioinformatics BioMed Central Methodology article Machine learning techniques in disease forecasting: a case study on rice blast prediction , 2006 .

[56]  George E. Sandusky,et al.  Dicer Is Required for Embryonic Angiogenesis during Mouse Development* , 2005, Journal of Biological Chemistry.

[57]  Jianguo Liu,et al.  Grading amino acid properties increased accuracies of single point mutation on protein stability prediction , 2011, BMC Bioinformatics.

[58]  P. Zhao,et al.  Combining Machine Learning and Homology-Based Approaches to Accurately Predict Subcellular Localization in Arabidopsis1[C][W][OA] , 2010, Plant Physiology.

[59]  John J Rossi,et al.  Rational design and in vitro and in vivo delivery of Dicer substrate siRNA , 2006, Nature Protocols.

[60]  Hong Sun,et al.  Human Dicer Binds Short Single-strand and Double-strand RNA with High Affinity and Interacts with Different Regions of the Nucleic Acids* , 2009, Journal of Biological Chemistry.

[61]  Gulshan Wadhwa,et al.  Evolution of homeobox protein sequence (Hoxa9) across different species using phylogenetic analysis and expression analysis of the sequence in reference to the occurrence of Acute Myeloid Leukemia , 2011 .

[62]  Hyeshik Chang,et al.  Dicer recognizes the 5′ end of RNA for efficient and accurate processing , 2011, Nature.

[63]  Nina V. Fedoroff,et al.  RNA Secondary Structural Determinants of miRNA Precursor Processing in Arabidopsis , 2010, Current Biology.

[64]  Paul Graves,et al.  A comprehensive analysis of precursor microRNA cleavage by human Dicer. , 2012, RNA.

[65]  Michael T. McManus,et al.  Dicer function is essential for lung epithelium morphogenesis , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Li Li,et al.  Computational approaches for microRNA studies: a review , 2010, Mammalian Genome.

[67]  Ryan D. Morin,et al.  Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. , 2008, Genome research.