Complexity measures of the mature miRNA for improving pre-miRNAs prediction

MOTIVATION The discovery of microRNA (miRNA) in the last decade has certainly changed the understanding of gene regulation in the cell. Although a large number of algorithms with different features have been proposed, they still predict an impractical amount of false positives. Most of the proposed features are based on the structure of precursors of the miRNA (pre-miRNA) only, not considering the important and relevant information contained in the mature miRNA. Such new kind of features could certainly improve the performance of the predictors of new miRNAs. RESULTS This paper presents three new features that are based on the sequence information contained in the mature miRNA. We will show how these new features, when used by a classical supervised machine learning approach as well as by more recent proposals based on deep learning, improve the prediction performance in a significant way. Moreover, several experimental conditions were defined and tested in order to evaluate the novel features impact in situations close to genome-wide analysis. The results show that the incorporation of new features based on the mature miRNA allow to improve the detection of new miRNAs independently of the classifier used. AVAILABILITY https://sourceforge.net/projects/sourcesinc/files/cplxmirna/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[2]  Jan Baumbach,et al.  On the performance of pre-microRNA detection algorithms , 2017, Nature Communications.

[3]  Georgina Stegmayer,et al.  Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Lingling Hu,et al.  miRClassify: An advanced web server for miRNA family classification and annotation , 2014, Comput. Biol. Medicine.

[5]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[6]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.

[7]  M. Tewari,et al.  MicroRNA profiling: approaches and considerations , 2012, Nature Reviews Genetics.

[8]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[9]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[10]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[11]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[12]  Benjamin M. Wheeler,et al.  The deep evolution of metazoan microRNAs , 2009, Evolution & development.

[13]  Alexander Schliep,et al.  The discriminant power of RNA features for pre-miRNA recognition , 2013, BMC Bioinformatics.

[14]  Anton J. Enright,et al.  Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests , 2017, Nucleic acids research.

[15]  Monya Baker,et al.  MicroRNA profiling: separating signal from noise , 2010, Nature Methods.

[16]  Thomas Schiex,et al.  DARN! A Weighted Constraint Solver for RNA Motif Localization , 2007, Constraints.

[17]  Huangxian Ju,et al.  MicroRNA: function, detection, and bioanalysis. , 2013, Chemical reviews.

[18]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[19]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[20]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[21]  Wen-Chi Chang,et al.  microRPM: a microRNA prediction model based only on plant small RNA sequencing data , 2018, Bioinform..

[22]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[23]  Milton Pividori,et al.  Predicting novel microRNA: a comprehensive comparison of machine learning approaches , 2019, Briefings Bioinform..

[24]  Vaibhav Shukla,et al.  A compilation of Web-based research tools for miRNA analysis , 2017, Briefings in functional genomics.

[25]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[26]  Ana Kozomara,et al.  miRBase: from microRNA sequences to function , 2018, Nucleic Acids Res..

[27]  R. Russell,et al.  Principles of MicroRNA–Target Recognition , 2005, PLoS biology.

[28]  Shuigeng Zhou,et al.  MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features , 2010, BMC Bioinformatics.

[29]  Li Li,et al.  Computational approaches for microRNA studies: a review , 2010, Mammalian Genome.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Luis Alvarez,et al.  Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications , 2012, Lecture Notes in Computer Science.

[32]  Valery Polyanovsky,et al.  Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences , 2011, Algorithms for Molecular Biology.

[33]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[34]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[35]  Christian Igel,et al.  An Introduction to Restricted Boltzmann Machines , 2012, CIARP.

[36]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[37]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[38]  Marek Sikora,et al.  HuntMi: an efficient and taxon-specific approach in pre-miRNA identification , 2013, BMC Bioinformatics.

[39]  Frank Johannes,et al.  Computational tools for plant small RNA detection and categorization , 2017, Briefings Bioinform..

[40]  B. Charrier,et al.  Computational prediction and experimental validation of microRNAs in the brown alga Ectocarpus siliculosus , 2013, Nucleic acids research.

[41]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[42]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[43]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[44]  C. Burge,et al.  Most mammalian mRNAs are conserved targets of microRNAs. , 2008, Genome research.

[45]  Georgina Stegmayer,et al.  Genome-wide pre-miRNA discovery from few labeled examples , 2018, Bioinform..

[46]  Alessandra Carbone,et al.  MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data , 2010, Bioinform..

[47]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[48]  Stefania Bortoluzzi,et al.  A survey of software tools for microRNA discovery and characterization using RNA-seq , 2019, Briefings Bioinform..

[49]  Louise C. Showe,et al.  Bioinformatics Original Paper Combining Multi-species Genomic Data for Microrna Identification Using a Naı¨ve Bayes Classifier , 2022 .

[50]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[51]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[52]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[53]  Yang Yang,et al.  Trends in the development of miRNA bioinformatics tools , 2019, Briefings Bioinform..

[54]  Georgina Stegmayer,et al.  miRNAfe: A comprehensive tool for feature extraction in microRNA prediction , 2015, Biosyst..

[55]  Yusuke Yamamoto,et al.  Loss of microRNA-27b contributes to breast cancer stem cell generation by activating ENPP1 , 2015, Nature Communications.