MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data

MOTIVATION MicroRNAs (miRNAs) are a class of endogenes derived from a precursor (pre-miRNA) and involved in post-transcriptional regulation. Experimental identification of novel miRNAs is difficult because they are often transcribed under specific conditions and cell types. Several computational methods were developed to detect new miRNAs starting from known ones or from deep sequencing data, and to validate their pre-miRNAs. RESULTS We present a genome-wide search algorithm, called MIReNA, that looks for miRNA sequences by exploring a multidimensional space defined by only five (physical and combinatorial) parameters characterizing acceptable pre-miRNAs. MIReNA validates pre-miRNAs with high sensitivity and specificity, and detects new miRNAs by homology from known miRNAs or from deep sequencing data. A performance comparison between MIReNA and four available predictive systems has been done. MIReNA approach is strikingly simple but it turns out to be powerful at least as much as more sophisticated algorithmic methods. MIReNA obtains better results than three known algorithms that validate pre-miRNAs. It demonstrates that machine-learning is not a necessary algorithmic approach for pre-miRNAs computational validation. In particular, machine learning algorithms can only confirm pre-miRNAs that look alike known ones, this being a limitation while exploring species with no known pre-miRNAs. The possibility to adapt the search to specific species, possibly characterized by specific properties of their miRNAs and pre-miRNAs, is a major feature of MIReNA. A parameter adjustment calibrates specificity and sensitivity in MIReNA, a key feature for predictive systems, which is not present in machine learning approaches. Comparison of MIReNA with miRDeep using deep sequencing data to predict miRNAs highlights a highly specific predictive power of MIReNA. AVAILABILITY At the address http://www.ihes.fr/carbone/data8/.

[1]  Daniel H. Huson,et al.  Identification of plant microRNA homologs , 2006, Bioinform..

[2]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[3]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[4]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[5]  Benjamin M. Wheeler,et al.  The deep evolution of metazoan microRNAs , 2009, Evolution & development.

[6]  N. Rajewsky,et al.  Discovering microRNAs from deep sequencing data using miRDeep , 2008, Nature Biotechnology.

[7]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[8]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[9]  Zissimos Mourelatos,et al.  The microRNA world: small is mighty. , 2003, Trends in biochemical sciences.

[10]  Michel J. Weber New human and mouse microRNA genes found by homology search , 2004, The FEBS journal.

[11]  D. Baulcombe,et al.  Identification of new small non-coding RNAs from tobacco and Arabidopsis. , 2005, Biochimie.

[12]  A. Falciatore,et al.  Gene silencing in the marine diatom Phaeodactylum tricornutum , 2009, Nucleic acids research.

[13]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[14]  S. Cox,et al.  Evidence that miRNAs are different from other RNAs , 2006, Cellular and Molecular Life Sciences CMLS.

[15]  Philip C. J. Donoghue,et al.  MicroRNAs and the advent of vertebrate morphological complexity , 2008, Proceedings of the National Academy of Sciences.

[16]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[17]  Peter F. Stadler,et al.  Evolutionary Genomics of microRNAs and Their Relatives , 2010 .

[18]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[19]  Tyler Risom,et al.  Evolutionary conservation of microRNA regulatory circuits: an examination of microRNA gene complexity and conserved microRNA-target interactions through metazoan phylogeny. , 2007, DNA and cell biology.

[20]  Christoph Flamm,et al.  The expansion of the metazoan microRNA repertoire , 2006, BMC Genomics.

[21]  F. Slack,et al.  The evolution of animal microRNA function. , 2007, Current opinion in genetics & development.

[22]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[23]  Jeffrey P. Mower,et al.  RNAi in Budding Yeast , 2009, Science.

[24]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[25]  B. Reinhart,et al.  Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA , 2000, Nature.

[26]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[27]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[28]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[29]  D M Crothers,et al.  Prediction of RNA secondary structure. , 1971, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Daniel Gautheret,et al.  Profile-based detection of microRNA precursors in animal genomes , 2005, Bioinform..

[31]  Mark A McPeek,et al.  The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. , 2006, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[32]  Mihaela Zavolan,et al.  Identification of Clustered Micrornas Using an Ab Initio Prediction Method , 2022 .

[33]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[34]  Jaume Bertranpetit,et al.  Comparative analysis of cancer genes in the human and chimpanzee genomes , 2006, BMC Genomics.

[35]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[36]  I. Tinoco,et al.  Estimation of Secondary Structure in Ribonucleic Acids , 1971, Nature.

[37]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[38]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[39]  Jennifer Daub,et al.  Expressed sequence tags: medium-throughput protocols. , 2004, Methods in molecular biology.

[40]  Weixiong Zhang,et al.  MicroRNA prediction with a novel ranking algorithm based on random walks , 2008, ISMB.

[41]  H. Cerutti,et al.  On the origin and functions of RNA-mediated silencing: from protists to man , 2006, Current Genetics.

[42]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..