miR-BAG: Bagging Based Identification of MicroRNA Precursors

Non-coding elements such as miRNAs play key regulatory roles in living systems. These ultra-short, ∼21 bp long, RNA molecules are derived from their hairpin precursors and usually participate in negative gene regulation by binding the target mRNAs. Discovering miRNA candidate regions across the genome has been a challenging problem. Most of the existing tools work reliably only for limited datasets. Here, we have presented a novel reliable approach, miR-BAG, developed to identify miRNA candidate regions in genomes by scanning sequences as well as by using next generation sequencing (NGS) data. miR-BAG utilizes a bootstrap aggregation based machine learning approach, successfully creating an ensemble of complementary learners to attain high accuracy while balancing sensitivity and specificity. miR-BAG was developed for wide range of species and tested extensively for performance over a wide range of experimentally validated data. Consideration of position-specific variation of triplet structural profiles and mature miRNA anchored structural profiles had a positive impact on performance. miR-BAG’s performance was found consistent and the accuracy level was observed to be >90% for most of the species considered in the present study. In a detailed comparative analysis, miR-BAG performed better than six existing tools. Using miR-BAG NGS module, we identified a total of 22 novel miRNA candidate regions in cow genome in addition to a total of 42 cow specific miRNA regions. In practice, discovery of miRNA regions in a genome demands high-throughput data analysis, requiring large amount of processing. Considering this, miR-BAG has been developed in multi-threaded parallel architecture as a web server as well as a user friendly GUI standalone version.

[1]  J. Keene RNA regulons: coordination of post-transcriptional events , 2007, Nature Reviews Genetics.

[2]  L. Lim,et al.  Transcripts Targeted by the MicroRNA-16 Family Cooperatively Regulate Cell Cycle Progression , 2007, Molecular and Cellular Biology.

[3]  Angela N. Brooks,et al.  Structural Basis for Double-Stranded RNA Processing by Dicer , 2006, Science.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Sanghyuk Lee,et al.  MicroRNA genes are transcribed by RNA polymerase II , 2004, The EMBO journal.

[6]  Yvan Saeys,et al.  Java-ML: A Machine Learning Library , 2009, J. Mach. Learn. Res..

[7]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[8]  Edwin Cuppen,et al.  Diversity of microRNAs in human and chimpanzee brain , 2006, Nature Genetics.

[9]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[10]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[11]  William Ritchie,et al.  miREval 2.0: a web tool for simple microRNA prediction in genome sequences , 2008, Bioinform..

[12]  Daniel Rios,et al.  Ensembl 2011 , 2010, Nucleic Acids Res..

[13]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[14]  Neil R Smalheiser,et al.  EST analyses predict the existence of a population of chimeric microRNA precursor-mRNA transcripts expressed in normal human and mouse tissues , 2003, Genome Biology.

[15]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[16]  R. Gregory,et al.  Many roads to maturity: microRNA biogenesis pathways and their regulation , 2009, Nature Cell Biology.

[17]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[18]  William Ritchie,et al.  Mireval: a web tool for simple microRNA prediction in genome sequences , 2008, Bioinform..

[19]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[20]  H. Niemann,et al.  Application of transgenesis in livestock for agriculture and biomedicine. , 2003, Animal reproduction science.

[21]  R. Shankar,et al.  The regulatory epicenter of miRNAs , 2011, Journal of Biosciences.

[22]  Ola Snøve,et al.  Conserved microRNA characteristics in mammals. , 2006, Oligonucleotides.

[23]  Ashwin Srinivasan,et al.  Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM) , 2010, BMC Bioinformatics.

[24]  S. Cox,et al.  Evidence that miRNAs are different from other RNAs , 2006, Cellular and Molecular Life Sciences CMLS.

[25]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[26]  R. Myers,et al.  Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. , 2005, Genome research.

[27]  Eran Halperin,et al.  miRNAkey: a software for microRNA deep sequencing analysis , 2010, Bioinform..

[28]  David R. Kelley,et al.  A whole-genome assembly of the domestic cow, Bos taurus , 2009, Genome Biology.

[29]  Dereje D. Jima,et al.  Deep sequencing of the small RNA transcriptome of normal and malignant human B cells identifies hundreds of novel microRNAs. , 2010, Blood.

[30]  L. Houdebine,et al.  Preparation of recombinant proteins in milk to improve human and animal health. , 2006, Reproduction, nutrition, development.

[31]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[32]  N. Rajewsky,et al.  Discovering microRNAs from deep sequencing data using miRDeep , 2008, Nature Biotechnology.

[33]  Bo Wei,et al.  MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences , 2011, BMC Bioinformatics.

[34]  Steven G. Schroeder,et al.  An atlas of bovine gene expression reveals novel distinctive tissue characteristics and evidence for improving genome annotation , 2010, Genome Biology.

[35]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[36]  Xiaowei Wang,et al.  Systematic identification of microRNA functions by combining target prediction and expression profiling , 2006, Nucleic acids research.

[37]  B. Davidson,et al.  RNA polymerase III transcribes human microRNAs , 2006, Nature Structural &Molecular Biology.

[38]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[39]  Ana M. Aransay,et al.  miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments , 2009, Nucleic Acids Res..

[40]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[41]  Peter F Stadler,et al.  Evolution of microRNAs. , 2006, Methods in molecular biology.

[42]  Ola R. Snøve,et al.  Reliable prediction of Drosha processing sites improves microRNA gene prediction. , 2007, Bioinformatics.

[43]  Louise C. Showe,et al.  Bioinformatics Original Paper Combining Multi-species Genomic Data for Microrna Identification Using a Naı¨ve Bayes Classifier , 2022 .

[44]  Robert D. Finn,et al.  Rfam: updates to the RNA families database , 2008, Nucleic Acids Res..

[45]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[46]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[47]  I. Macara,et al.  Exportin-5, a novel karyopherin, mediates nuclear export of double-stranded RNA binding proteins , 2002, The Journal of cell biology.

[48]  W. Filipowicz,et al.  Regulation of mRNA translation and stability by microRNAs. , 2010, Annual review of biochemistry.

[49]  Alessandra Carbone,et al.  MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data , 2010, Bioinform..