HHMMiR: efficient de novo prediction of microRNAs using hierarchical hidden Markov models

BackgroundMicroRNA s (miRNAs) are small non-coding single-stranded RNAs (20–23 nts) that are known to act as post-transcriptional and translational regulators of gene expression. Although, they were initially overlooked, their role in many important biological processes, such as development, cell differentiation, and cancer has been established in recent times. In spite of their biological significance, the identification of miRNA genes in newly sequenced organisms is still based, to a large degree, on extensive use of evolutionary conservation, which is not always available.ResultsWe have developed HHMMiR, a novel approach for de novo miRNA hairpin prediction in the absence of evolutionary conservation. Our method implements a Hierarchical Hidden Markov Model (HHMM) that utilizes region-based structural as well as sequence information of miRNA precursors. We first established a template for the structure of a typical miRNA hairpin by summarizing data from publicly available databases. We then used this template to develop the HHMM topology.ConclusionOur algorithm achieved average sensitivity of 84% and specificity of 88%, on 10-fold cross-validation of human miRNA precursor data. We also show that this model, trained on human sequences, works well on hairpins from other vertebrate as well as invertebrate species. Furthermore, the human trained model was able to correctly classify ~97% of plant miRNA precursors. The success of this approach in such a diverse set of species indicates that sequence conservation is not necessary for miRNA prediction. This may lead to efficient prediction of miRNA genes in virtually any organism.

[1]  K. Norman,et al.  MicroRNAs: expression, avoidance and subversion by vertebrate viruses , 2006, Nature Reviews Microbiology.

[2]  Daniel Gautheret,et al.  Profile-based detection of microRNA precursors in animal genomes , 2005, Bioinform..

[3]  Taishin Kin,et al.  miRRim: a novel system to find conserved miRNAs with high sensitivity and specificity. , 2007, RNA.

[4]  Nancy Wilkins-Diehr,et al.  TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications , 2006, High Performance Computing Workshop.

[5]  T. Du,et al.  Asymmetry in the Assembly of the RNAi Enzyme Complex , 2003, Cell.

[6]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[7]  C. Sander,et al.  Identification of microRNAs of the herpesvirus family , 2005, Nature Methods.

[8]  HEN1 recognizes 21–24 nt small RNA duplexes and deposits a methyl group onto the 2′ OH of the 3′ terminal nucleotide , 2006, Nucleic acids research.

[9]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[10]  G. Church,et al.  Computational and experimental identification of C. elegans microRNAs. , 2003, Molecular cell.

[11]  R. Shiekhattar,et al.  TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing , 2005, Nature.

[12]  B. Cullen,et al.  Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. , 2003, Genes & development.

[13]  Peter M. Waterhouse,et al.  Plant and animal microRNAs: similarities and differences , 2005, Functional & Integrative Genomics.

[14]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[15]  Ola Snøve,et al.  Conserved microRNA characteristics in mammals. , 2006, Oligonucleotides.

[16]  Byoung-Tak Zhang,et al.  Human microRNA prediction through a probabilistic co-learning model of sequence and structure , 2005, Nucleic acids research.

[17]  Eugene Berezikov,et al.  Approaches to microRNA discovery , 2006, Nature Genetics.

[18]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[19]  P. Samollow The opossum genome: insights and opportunities from an alternative mammal. , 2008, Genome research.

[20]  Sanghyuk Lee,et al.  MicroRNA genes are transcribed by RNA polymerase II , 2004, The EMBO journal.

[21]  P. Seeburg,et al.  Modulation of microRNA processing and expression through RNA editing by ADAR deaminases , 2006, Nature Structural &Molecular Biology.

[22]  Mihaela Zavolan,et al.  Identification of Clustered Micrornas Using an Ab Initio Prediction Method , 2022 .

[23]  Sam Griffiths-Jones,et al.  miRBase: the microRNA sequence database. , 2006, Methods in molecular biology.

[24]  B. Cullen,et al.  Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. , 2004, RNA.

[25]  Lucio Grandinetti,et al.  High Performance Computing and Grids in Action , 2008 .

[26]  V. Kim,et al.  The nuclear RNase III Drosha initiates microRNA processing , 2003, Nature.

[27]  G. Ruvkun,et al.  Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans , 1993, Cell.

[28]  Michael Hu,et al.  A Hierarchical HMM Implementation for Vertebrate Gene Splice Site Prediction , 2000 .

[29]  T. Tuschl,et al.  Identification of Tissue-Specific MicroRNAs from Mouse , 2002, Current Biology.

[30]  C. Burge,et al.  Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. , 2004, RNA.

[31]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[32]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[33]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[34]  B. Reinhart,et al.  The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans , 2000, Nature.

[35]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.

[36]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[37]  Eric P. Xing,et al.  BayCis: A Bayesian Hierarchical HMM for Cis-Regulatory Module Decoding in Metazoan Genomes , 2008, RECOMB.