EMDS: predicting essential miRNAs based on deep learning and sequences

MicroRNAs (miRNAs) as small 19- to 24-nucleotide noncoding RNAs play crucial roles in some key biological progress associated with human diseases. Therefore, identifying the essentiality of miRNAs is important to systematically understand the pathogenic mechanism of diseases. There are some computational methods have been developed to predict essential miRNAs because traditional biological experiments are both time- and labor-consuming. However, these computational methods only used the statistical feature and structural feature of miRNA sequences. The timing characteristics of sequences also should be considered to improve the prediction performance. In addition, the capability deep learning model is well-known. Therefore, in this study, we present a computational method (called EMDS) to predict essential miRNAs. EMDS takes not only the statistical and structural features of sequences but also the subsequence features based on the time characteristics of sequences and Convolutional Neural Networks (CNN). Furthermore, considering that the successful applications of attention mechanism and the subsequence in a miRNA sequence are important, we use a neural attention mechanism to obtain subsequence features of miRNAs. Finally, we integrate the statistical features and structural features, subsequence features as final miRNA features which is inputted into Light Gradient Boosting Machine (LGBM) to predict essential miRNAs. We evaluate the prediction performance of our method by the 5-fold cross validation. We also compare EMDS with other four competing methods which include PESM, miES, Gaussian Naive Bayes (Gaus_NB) and Support Vector Machine (SVM) by performing same cross validation experiments. The results show that EMDS achieves better prediction performance in terms of AUC (EMDS:0.9335, PESM:0.9117, miES:0.8837, Gaus_NB:0.8720, SVM:0.8571). It also illustrates that our method can effectively predict the essential miRNAs.

[1]  Pu-Feng Du,et al.  XGEM: Predicting Essential miRNAs by the Ensembles of Various Sequence-Based Classifiers With XGBoost Algorithm , 2022, Frontiers in Genetics.

[2]  OUP accepted manuscript , 2022, Bioinformatics.

[3]  Dongqing Wei,et al.  MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph , 2021, Briefings Bioinform..

[4]  Zhu-Hong You,et al.  A graph auto-encoder model for miRNA-disease associations prediction , 2020, Briefings Bioinform..

[5]  Jianxin Wang,et al.  PESM: predicting the essentiality of miRNAs based on gradient boosting machines and sequences , 2020, BMC Bioinformatics.

[6]  Yi Pan,et al.  A novel extended pareto optimality consensus model for predicting essential proteins. , 2019, Journal of theoretical biology.

[7]  Yuan Zhou,et al.  HMDD v3.0: a database for experimentally supported human microRNA–disease associations , 2018, Nucleic Acids Res..

[8]  Fei Song,et al.  miES: predicting the essentiality of miRNAs with machine learning and sequence features , 2018, Bioinform..

[9]  Yi Pan,et al.  DNRLMF-MDA:Predicting microRNA-Disease Associations Based on Similarities of microRNAs and Diseases , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Wen-Chi Chang,et al.  microRPM: a microRNA prediction model based only on plant small RNA sequencing data , 2018, Bioinform..

[11]  D. Bartel Metazoan MicroRNAs , 2018, Cell.

[12]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[13]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[14]  Feng Gao,et al.  A Comprehensive Overview of Online Resources to Identify and Predict Bacterial Essential Genes , 2017, Front. Microbiol..

[15]  Georgina Stegmayer,et al.  High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Wei Tang,et al.  dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers , 2016, Nucleic Acids Res..

[17]  Spiridon D. Likothanassis,et al.  YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Ting Wang,et al.  OncomiRDB: a database for the experimentally verified oncogenic and tumor-suppressive microRNAs , 2014, Bioinform..

[20]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[21]  Di Wu,et al.  miRCancer: a microRNA-cancer association database constructed by text mining on literature , 2013, Bioinform..

[22]  G. Fu,et al.  MicroRNAs in Human Placental Development and Pregnancy Complications , 2013, International journal of molecular sciences.

[23]  Margaret S. Ebert,et al.  Roles for MicroRNAs in Conferring Robustness to Biological Processes , 2012, Cell.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  H. Abdi,et al.  Principal component analysis , 2010 .

[27]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[28]  Jianjun Chen,et al.  miR-21 plays a pivotal role in gastric cancer pathogenesis and progression , 2008, Laboratory Investigation.

[29]  Xiaolong Wang,et al.  Sequence analysis Application of latent semantic analysis to protein remote homology detection , 2006 .

[30]  V. Ambros,et al.  A short history of a short RNA , 2004, Cell.

[31]  Zissimos Mourelatos,et al.  The microRNA world: small is mighty. , 2003, Trends in biochemical sciences.

[32]  V. Kim,et al.  The nuclear RNase III Drosha initiates microRNA processing , 2003, Nature.

[33]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[34]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[35]  V. Ambros microRNAs Tiny Regulators with Great Potential , 2001, Cell.

[36]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[37]  G. Ruvkun,et al.  Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans , 1993, Cell.

[38]  J. Sulston,et al.  Isolation and genetic characterization of cell-lineage mutants of the nematode Caenorhabditis elegans. , 1980, Genetics.