MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features

To distinguish the real pre-miRNAs from other hairpin sequences with similar stem-loops (pseudo pre-miRNAs), a hybrid feature which consists of local contiguous structure-sequence composition, minimum of free energy (MFE) of the secondary structure and P-value of randomization test is used. Besides, a novel machine-learning algorithm, random forest (RF), is introduced. The results suggest that our method predicts at 98.21% specificity and 95.09% sensitivity. When compared with the previous study, Triplet-SVM-classifier, our RF method was nearly 10% greater in total accuracy. Further analysis indicated that the improvement was due to both the combined features and the RF algorithm. The MiPred web server is available at http://www.bioinf.seu.edu.cn/miRNA/. Given a sequence, MiPred decides whether it is a pre-miRNA-like hairpin sequence or not. If the sequence is a pre-miRNA-like hairpin, the RF classifier will predict whether it is a real pre-miRNA or a pseudo one.

[1]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[2]  Michael Boutros,et al.  Managing the genome: microRNAs in Drosophila. , 2004, Differentiation; research in biological diversity.

[3]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[4]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[5]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[6]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Jia'er Chen,et al.  The National Natural Science Fundation of China , 2003 .

[9]  D. Bartel,et al.  Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. , 2005, RNA.

[10]  Yang Wang,et al.  MicroRNA: past and present. , 2007, Frontiers in bioscience : a journal and virtual library.

[11]  A. Krogh,et al.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. , 1999, Nucleic acids research.

[12]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[13]  Zissimos Mourelatos,et al.  The microRNA world: small is mighty. , 2003, Trends in biochemical sciences.

[14]  Byoung-Tak Zhang,et al.  Human microRNA prediction through a probabilistic co-learning model of sequence and structure , 2005, Nucleic acids research.

[15]  Louise C. Showe,et al.  Bioinformatics Original Paper Combining Multi-species Genomic Data for Microrna Identification Using a Naı¨ve Bayes Classifier , 2022 .

[16]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[17]  D. Bartel,et al.  Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. , 2004, Molecular cell.

[18]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[19]  P. Rouzé,et al.  Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Yves Van de Peer,et al.  Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences , 2004, Bioinform..

[23]  Byoung-Tak Zhang,et al.  ProMiR II: a web server for the probabilistic prediction of clustered, nonclustered, conserved and nonconserved microRNAs , 2006, Nucleic Acids Res..

[24]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[25]  Ola R. Snøve,et al.  Reliable prediction of Drosha processing sites improves microRNA gene prediction. , 2007, Bioinformatics.

[26]  Mihaela Zavolan,et al.  Identification of Clustered Micrornas Using an Ab Initio Prediction Method , 2022 .

[27]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  V. Kim,et al.  The nuclear RNase III Drosha initiates microRNA processing , 2003, Nature.

[30]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[31]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[32]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.