On the performance of pre-microRNA detection algorithms

MicroRNAs are crucial for post-transcriptional gene regulation, and their dysregulation has been associated with diseases like cancer and, therefore, their analysis has become popular. The experimental discovery of miRNAs is cumbersome and, thus, many computational tools have been proposed. Here we assess 13 ab initio pre-miRNA detection approaches using all relevant, published, and novel data sets while judging algorithm performance based on ten intrinsic performance measures. We present an extensible framework, izMiR, which allows for the unbiased comparison of existing algorithms, adding new ones, and combining multiple approaches into ensemble methods. In an exhaustive attempt, we condense the results of millions of computations and show that no method is clearly superior; however, we provide a guideline for biomedical researchers to select a tool. Finally, we demonstrate that combining all of the methods into one ensemble approach, for the first time, allows reliable purely computational pre-miRNA detection in large eukaryotic genomes.As the experimental discovery of microRNAs (miRNAs) is cumbersome, computational tools have been developed for the prediction of pre-miRNAs. Here the authors develop a framework to assess the performance of existing and novel pre-miRNA prediction tools and provide guidelines for selecting an appropriate approach for a given data set.

[1]  Jens Allmer,et al.  A Call for Benchmark Data in Mass Spectrometry-Based Proteomics , 2012 .

[2]  Baohong Zhang,et al.  MicroRNA‐Based Biotechnology for Plant Improvement , 2015, Journal of cellular physiology.

[3]  A. G. Tonevitsky,et al.  Circulating miRNAs: cell–cell communication function? , 2013, Front. Genet..

[4]  Shuigeng Zhou,et al.  MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features , 2010, BMC Bioinformatics.

[5]  N. Rajewsky,et al.  Discovering microRNAs from deep sequencing data using miRDeep , 2008, Nature Biotechnology.

[6]  D. Bartel,et al.  MicroRNAS and their regulatory roles in plants. , 2006, Annual review of plant biology.

[7]  E. Hovig,et al.  A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome. , 2015, Annual review of genetics.

[8]  Caner Bagci,et al.  Computational Prediction of MicroRNAs from Toxoplasma gondii Potentially Regulating the Hosts’ Gene Expression , 2014, Genom. Proteom. Bioinform..

[9]  Daniel B. Martin,et al.  Circulating microRNAs as stable blood-based markers for cancer detection , 2008, Proceedings of the National Academy of Sciences.

[10]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  S. Griffiths-Jones,et al.  miRBase: microRNA Sequences and Annotation , 2010, Current protocols in bioinformatics.

[12]  Jens Allmer,et al.  Machine learning methods for microRNA gene prediction. , 2014, Methods in molecular biology.

[13]  Junjie Chen,et al.  iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions , 2016, Scientific Reports.

[14]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[15]  Jan-Peter Nap,et al.  In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity , 2009, BMC Genomics.

[16]  Weixiong Zhang,et al.  MicroRNA prediction with a novel ranking algorithm based on random walks , 2008, ISMB.

[17]  Isaac Bentwich,et al.  Identifying human microRNAs. , 2008, Current topics in microbiology and immunology.

[18]  Marek Sikora,et al.  HuntMi: an efficient and taxon-specific approach in pre-miRNA identification , 2013, BMC Bioinformatics.

[19]  D. Bartel,et al.  The impact of microRNAs on protein output , 2008, Nature.

[20]  H. Soifer,et al.  MicroRNAs in disease and potential therapeutic applications. , 2007, Molecular therapy : the journal of the American Society of Gene Therapy.

[21]  S. Gottesman Micros for microbes: non-coding regulatory RNAs in bacteria. , 2005, Trends in genetics : TIG.

[22]  Anton J. Enright,et al.  Identification of Virus-Encoded MicroRNAs , 2004, Science.

[23]  William Ritchie,et al.  Defining and providing robust controls for microRNA prediction , 2012, Bioinform..

[24]  Jens Allmer,et al.  Differential Expression of Toxoplasma gondii MicroRNAs in Murine and Human Hosts , 2016 .

[25]  Liang-Hu Qu,et al.  Application of microRNA gene resources in the improvement of agronomic traits in rice. , 2015, Plant biotechnology journal.

[26]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[27]  P. Poirazi,et al.  MatureBayes: A Probabilistic Algorithm for Identifying the Mature miRNA within Novel Precursors , 2010, PloS one.

[28]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[29]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[30]  Georgina Stegmayer,et al.  miRNAfe: A comprehensive tool for feature extraction in microRNA prediction , 2015, Biosyst..

[31]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[32]  John Nerbonne,et al.  Data Analysis, Machine Learning and Applications. Proceedings of the 31st Annual Conference ofthe Gesellschaft für Klassifikation e.V., Albert-Ludwigs Universität Freiburg, March 7-9, 2007 , 2008 .

[33]  Ç. Avcı,et al.  Use of microRNAs in personalized medicine. , 2014, Methods in molecular biology.

[34]  Monya Baker,et al.  MicroRNA profiling: separating signal from noise , 2010, Nature Methods.

[35]  Huangxian Ju,et al.  MicroRNA: function, detection, and bioanalysis. , 2013, Chemical reviews.

[36]  William Ritchie,et al.  miREval 2.0: a web tool for simple microRNA prediction in genome sequences , 2008, Bioinform..

[37]  V. Kim,et al.  Biogenesis of small RNAs in animals , 2009, Nature Reviews Molecular Cell Biology.

[38]  M. Tewari,et al.  MicroRNA profiling: approaches and considerations , 2012, Nature Reviews Genetics.

[39]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[40]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[41]  Yong Peng,et al.  The role of MicroRNAs in human cancer , 2016, Signal Transduction and Targeted Therapy.

[42]  W. Roa,et al.  MicroRNA expression profiling of sputum for the detection of early and locally advanced non-small-cell lung cancer: a prospective case-control study. , 2016, Current oncology.

[43]  Xiang Zhou,et al.  A review: microRNA detection methods. , 2015, Organic & biomolecular chemistry.

[44]  M. D. Boer,et al.  Discovery of new microRNAs by small RNAome deep sequencing in childhood acute lymphoblastic leukemia , 2011, Leukemia.

[45]  Jens Allmer,et al.  Can MiRBase Provide Positive Data for Machine Learning for the Detection of MiRNA Hairpins? , 2013, J. Integr. Bioinform..

[46]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[47]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[48]  Benjamin M. Wheeler,et al.  The deep evolution of metazoan microRNAs , 2009, Evolution & development.

[49]  Alexander Schliep,et al.  The discriminant power of RNA features for pre-miRNA recognition , 2013, BMC Bioinformatics.

[50]  Yufei Huang,et al.  MaturePred: Efficient Identification of MicroRNAs within Novel Plant Pre-miRNAs , 2011, PloS one.

[51]  Ali M. Ardekani,et al.  The Role of MicroRNAs in Human Diseases , 2010, Avicenna journal of medical biotechnology.

[52]  Ana M. Aransay,et al.  miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments , 2011, Nucleic Acids Res..