Distinguishing mirtrons from canonical miRNAs with data exploration and machine learning methods

Mirtrons are non-canonical microRNAs encoded in introns the biogenesis of which starts with splicing. They are not processed by Drosha and enter the canonical pathway at the Exportin-5 level. Mirtrons are much less evolutionary conserved than canonical miRNAs. Due to the differences, canonical miRNA predictors are not applicable to mirtron prediction. Identification of differences is important for designing mirtron prediction algorithms and may help to improve the understanding of mirtron functioning. So far, only simple, single-feature comparisons were reported. These are insensitive to complex feature relations. We quantified miRNAs with 25 features and showed that it is impossible to distinguish the two miRNA species using simple thresholds on any single feature. However, when using the Principal Component Analysis mirtrons and canonical miRNAs are grouped separately. Moreover, several methodologically diverse machine learning classifiers delivered high classification performance. Using feature selection algorithms we found features (e.g. bulges in the stem region), previously reported divergent in two classes, that did not contribute to improving classification accuracy, which suggests that they are not biologically meaningful. Finally, we proposed a combination of the most important features (including Guanine content, hairpin free energy and hairpin length) which convey a specific pattern, crucial for identifying mirtrons.

[1]  Q. Zou,et al.  Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition , 2016, International journal of molecular sciences.

[2]  J. Pekow,et al.  The emerging role of miRNAs in inflammatory bowel disease: a review , 2015, Therapeutic advances in gastroenterology.

[3]  Jan Baumbach,et al.  On the performance of pre-microRNA detection algorithms , 2017, Nature Communications.

[4]  Jiayu Wen,et al.  Analysis of Nearly One Thousand Mammalian Mirtrons Reveals Novel Features of Dicer Substrates , 2015, PLoS Comput. Biol..

[5]  P Vallotton,et al.  Detection of tubule boundaries based on circular shortest path and polar‐transformation of arbitrary shapes , 2016, Journal of microscopy.

[6]  E. Lai,et al.  The Mirtron Pathway Generates microRNA-Class Regulatory RNAs in Drosophila , 2007, Cell.

[7]  C. Croce,et al.  MicroRNAs in Cancer. , 2009, Annual review of medicine.

[8]  Bo Wei,et al.  MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences , 2011, BMC Bioinformatics.

[9]  Junjie Chen,et al.  iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions , 2016, Scientific Reports.

[10]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[11]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  S. Dalal,et al.  The Role of MicroRNA in Inflammatory Bowel Disease. , 2010, Gastroenterology & hepatology.

[13]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[14]  K. Reddy,et al.  MicroRNA (miRNA) in cancer , 2015, Cancer Cell International.

[15]  S. Lawler,et al.  MicroRNAs in cancer: biomarkers, functions and therapy. , 2014, Trends in molecular medicine.

[16]  Ran Su,et al.  Identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine , 2017, Scientific Reports.

[17]  Marek Sikora,et al.  HuntMi: an efficient and taxon-specific approach in pre-miRNA identification , 2013, BMC Bioinformatics.

[18]  Yong Peng,et al.  The role of MicroRNAs in human cancer , 2016, Signal Transduction and Targeted Therapy.

[19]  Chunxiang Zhang,et al.  MicroRNAs in Vascular Disease , 2011, Journal of cardiovascular pharmacology.

[20]  K. Czaplinski,et al.  Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. , 2004, RNA.

[21]  Chun-Hung Lai,et al.  miRNA arm selection and isomiR distribution in gastric cancer , 2012, BMC Genomics.

[22]  Ashish Ranjan Sharma,et al.  Therapeutic miRNA and siRNA: Moving from Bench to Clinic as Next Generation Medicine , 2017, Molecular therapy. Nucleic acids.

[23]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[24]  Shuigeng Zhou,et al.  MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features , 2010, BMC Bioinformatics.

[25]  G. Hannon,et al.  Processing of primary microRNAs by the Microprocessor complex , 2004, Nature.

[26]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[27]  Robert D. Stedtfeld,et al.  MicroRNAs-Based Inter-Domain Communication between the Host and Members of the Gut Microbiome , 2017, Front. Microbiol..

[28]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[29]  E. Lai,et al.  Discovery of hundreds of mirtrons in mouse and human small RNA data , 2012, Genome research.

[30]  Kota Shimada,et al.  Plasma miRNA expression profiles in rheumatoid arthritis associated interstitial lung disease , 2017, BMC Musculoskeletal Disorders.

[31]  B. Liu,et al.  Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. , 2015, Journal of theoretical biology.

[32]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[33]  Panagiotis Kougias,et al.  miRNAs: roles and clinical applications in vascular disease , 2011, Expert review of molecular diagnostics.

[34]  Xing Gao,et al.  Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique , 2015, IEEE Transactions on NanoBioscience.

[35]  G. Hannon,et al.  Small RNA sorting: matchmaking for Argonautes , 2011, Nature Reviews Genetics.

[36]  D. Bartel,et al.  Intronic microRNA precursors that bypass Drosha processing , 2007, Nature.

[37]  Weixiong Zhang,et al.  MicroRNA prediction with a novel ranking algorithm based on random walks , 2008, ISMB.

[38]  E. Lai,et al.  Common and distinct patterns of terminal modifications to mirtrons and canonical microRNAs. , 2012, RNA.

[39]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[40]  Liu Hong,et al.  Role of MiRNAs in Inflammatory Bowel Disease , 2017, Digestive Diseases and Sciences.

[41]  Eugene Berezikov,et al.  Mammalian mirtron genes. , 2007, Molecular cell.

[42]  Eric C Lai,et al.  Mirtrons: microRNA biogenesis via splicing. , 2011, Biochimie.

[43]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[44]  Sanghyuk Lee,et al.  MicroRNA genes are transcribed by RNA polymerase II , 2004, The EMBO journal.

[45]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[46]  William Ritchie,et al.  miREval 2.0: a web tool for simple microRNA prediction in genome sequences , 2008, Bioinform..

[47]  Michael Chen,et al.  Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans. , 2011, Genome research.

[48]  Andrew R. Webb,et al.  Statistical Pattern Recognition: Webb/Statistical Pattern Recognition , 2011 .

[49]  Xing Gao,et al.  An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information , 2015, IEEE Transactions on NanoBioscience.