Unique folding of precursor microRNAs: quantitative evidence and implications for de novo identification.

MicroRNAs (miRNAs) participate in diverse cellular and physiological processes through the post-transcriptional gene regulatory pathway. Hairpin is a crucial structural feature for the computational identification of precursor miRNAs (pre-miRs), as its formation is critically associated with the early stages of the mature miRNA biogenesis. Our incomplete knowledge about the number of miRNAs present in the genomes of vertebrates, worms, plants, and even viruses necessitates thorough understanding of their sequence motifs, hairpin structural characteristics, and topological descriptors. In this in-depth study, we investigate a comprehensive and heterogeneous collection of 2241 published (nonredundant) pre-miRs across 41 species (miRBase 8.2), 8494 pseudohairpins extracted from the human RefSeq genes, 12,387 (nonredundant) ncRNAs spanning 457 types (Rfam 7.0), 31 full-length mRNAs randomly selected from GenBank, and four sets of synthetically generated genomic background corresponding to each of the native RNA sequence. Our large-scale characterization analysis reveals that pre-miRs are significantly different from other types of ncRNAs, pseudohairpins, mRNAs, and genomic background according to the nonparametric Kruskal-Wallis ANOVA (p<0.001). We examine the intrinsic and global features at the sequence, structural, and topological levels including %G+C content, normalized base-pairing propensity P(S), normalized minimum free energy of folding MFE(s), normalized Shannon entropy Q(s), normalized base-pair distance D(s), and degree of compactness F(S), as well as their corresponding Z scores of P(S), MFE(s), Q(s), D(s), and F(S). The findings will promote more accurate guidelines and distinctive criteria for the prediction of novel pre-miRs with improved performance.

[1]  K. Norman,et al.  MicroRNAs: expression, avoidance and subversion by vertebrate viruses , 2006, Nature Reviews Microbiology.

[2]  Cristina Romero-López,et al.  Ribozymes: recent advances in the development of RNA tools. , 2003, FEMS microbiology reviews.

[3]  F. Slack,et al.  Control of developmental timing by small temporal RNAs: a paradigm for RNA‐mediated regulation of gene expression , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[4]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[5]  A. Krogh,et al.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. , 1999, Nucleic acids research.

[6]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[7]  Eugene Berezikov,et al.  Approaches to microRNA discovery , 2006, Nature Genetics.

[8]  T. Schlick,et al.  RAG: RNA-As-Graphs database—concepts, analysis, and features , 1987 .

[9]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[10]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[11]  S. Altschul,et al.  Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. , 1985, Molecular biology and evolution.

[12]  A. Bradley,et al.  Identification of mammalian microRNA host genes and transcription units. , 2004, Genome research.

[13]  Danny Barash,et al.  Spectral decomposition of the Laplacian matrix applied to RNA folding prediction , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[14]  V. Ambros,et al.  An Extensive Class of Small RNAs in Caenorhabditis elegans , 2001, Science.

[15]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[16]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[17]  V. Ambros,et al.  The Cold Shock Domain Protein LIN-28 Controls Developmental Timing in C. elegans and Is Regulated by the lin-4 RNA , 1997, Cell.

[18]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[19]  B. Cullen Transcription and processing of human microRNA precursors. , 2004, Molecular cell.

[20]  Yves Van de Peer,et al.  Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences , 2004, Bioinform..

[21]  V. Kim MicroRNA biogenesis: coordinated cropping and dicing , 2005, Nature Reviews Molecular Cell Biology.

[22]  Bruce A. Hay,et al.  The Drosophila MicroRNA Mir-14 Suppresses Cell Death and Is Required for Normal Fat Metabolism , 2003, Current Biology.

[23]  Xiao Li,et al.  Computational detection of microRNAs targeting transcription factor genes in Arabidopsis thaliana , 2005, Comput. Biol. Chem..

[24]  Mathias Sprinzl,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 1993, Nucleic Acids Res..

[25]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[26]  S. Cox,et al.  Evidence that miRNAs are different from other RNAs , 2006, Cellular and Molecular Life Sciences CMLS.

[27]  D. Bartel,et al.  MicroRNA-Directed Cleavage of HOXB8 mRNA , 2004, Science.

[28]  E. Nudler,et al.  The riboswitch control of bacterial metabolism. , 2004, Trends in biochemical sciences.

[29]  T. Schlick,et al.  Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. , 2003, Nucleic acids research.

[30]  P. Hraber,et al.  Estimating the Contributions of Selection and Self-Organization in RNA Secondary Structure , 1999, Journal of Molecular Evolution.

[31]  B. Cullen,et al.  Structural requirements for pre-microRNA binding and nuclear export by Exportin 5. , 2004, Nucleic acids research.

[32]  Gary D Stormo,et al.  New tricks for an old dogma: riboswitches as cis-only regulatory systems. , 2003, Molecular cell.

[33]  R. Breaker,et al.  Gene regulation by riboswitches , 2004, Nature Reviews Molecular Cell Biology.

[34]  Terry Gaasterland,et al.  Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets , 2004, Genome Biology.

[35]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[36]  J. Steitz,et al.  Guided tours: from precursor snoRNA to functional snoRNP. , 1999, Current opinion in cell biology.

[37]  Brian S. Roberts,et al.  The colorectal microRNAome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[38]  David W. Digby,et al.  mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. , 1999, Nucleic acids research.

[39]  R. Russell,et al.  bantam Encodes a Developmentally Regulated microRNA that Controls Cell Proliferation and Regulates the Proapoptotic Gene hid in Drosophila , 2003, Cell.

[40]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[41]  Byoung-Tak Zhang,et al.  Human microRNA prediction through a probabilistic co-learning model of sequence and structure , 2005, Nucleic acids research.

[42]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[43]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[44]  Peter Clote,et al.  RNALOSS: a web server for RNA locally optimal secondary structures , 2005, Nucleic Acids Res..

[45]  T. Dalmay,et al.  Identification of new central nervous system specific mouse microRNAs , 2006, FEBS letters.

[46]  M. Huynen,et al.  Assessing the reliability of RNA folding using statistical mechanics. , 1997, Journal of molecular biology.

[47]  Anton J. Enright,et al.  Identification of Virus-Encoded MicroRNAs , 2004, Science.

[48]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[49]  Simon Kasif,et al.  On the normalization of RNA equilibrium free energy to the length of the sequence. , 2003, Nucleic acids research.

[50]  G. Storz An Expanding Universe of Noncoding RNAs , 2002, Science.

[51]  C. Sander,et al.  Identification of microRNAs of the herpesvirus family , 2005, Nature Methods.

[52]  John G Doench,et al.  Specificity of microRNA target selection in translational repression. , 2004, Genes & development.

[53]  M. Samols,et al.  Cloning and Identification of a MicroRNA Cluster within the Latency-Associated Region of Kaposi's Sarcoma-Associated Herpesvirus , 2005, Journal of Virology.

[54]  G. Soukup,et al.  Riboswitches exert genetic control through metabolite-induced conformational change. , 2004, Current opinion in structural biology.

[55]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[56]  Hong Jiang,et al.  Identification of human fetal liver miRNAs by a novel method , 2005, FEBS letters.

[57]  P. Rouzé,et al.  Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[59]  Namhee Kim,et al.  RAG: RNA-As-Graphs web resource , 2004, BMC Bioinformatics.

[60]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[61]  P. Clote,et al.  Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. , 2005, RNA.

[62]  A. Adai,et al.  Computational prediction of miRNAs in Arabidopsis thaliana. , 2005, Genome research.

[63]  D. Bartel,et al.  Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. , 2004, Molecular cell.

[64]  R. Plasterk,et al.  Dicers at RISC The Mechanism of RNAi , 2004, Cell.

[65]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[66]  I. Tinoco,et al.  How RNA folds. , 1999, Journal of molecular biology.

[67]  D. Barash Deleterious mutation prediction in the secondary structure of RNAs. , 2003, Nucleic acids research.

[68]  P. Svoboda,et al.  Hairpin RNA: a secondary structure of primary importance , 2006, Cellular and Molecular Life Sciences CMLS.

[69]  T. Henkin,et al.  The GA motif: an RNA element common to bacterial antitermination systems, rRNA, and eukaryotic RNAs. , 2001, RNA.

[70]  Guiliang Tang,et al.  siRNA and miRNA: an insight into RISCs. , 2005, Trends in biochemical sciences.

[71]  A. Ellington,et al.  A (ribo) switch in the paradigms of genetic regulation , 2002, Nature Structural Biology.

[72]  V. Kim,et al.  MicroRNA maturation: stepwise processing and subcellular localization , 2002, The EMBO journal.

[73]  B. Reinhart,et al.  The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans , 2000, Nature.

[74]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[75]  D. Bartel,et al.  MicroRNAs Modulate Hematopoietic Lineage Differentiation , 2004, Science.

[76]  E. Lai RNA Sensors and Riboswitches: Self-Regulating Messages , 2003, Current Biology.

[77]  Danny Barash,et al.  Second eigenvalue of the Laplacian matrix for predicting RNA conformational switch by mutation , 2004, Bioinform..

[78]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[79]  G. Storz,et al.  An abundance of RNA regulators. , 2005, Annual review of biochemistry.

[80]  Peter M. Waterhouse,et al.  Plant and animal microRNAs: similarities and differences , 2005, Functional & Integrative Genomics.

[81]  T. Cech Self-splicing of group I introns. , 1990, Annual review of biochemistry.

[82]  M. Gelfand,et al.  Riboswitches: the oldest mechanism for the regulation of gene expression? , 2004, Trends in genetics : TIG.

[83]  L. Lim,et al.  An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans , 2001, Science.

[84]  J. Vogel,et al.  The ins and outs of group II introns. , 2001, Trends in genetics : TIG.

[85]  Eivind Coward,et al.  Shufflet: shuffling sequences while conserving the k-let counts , 1999, Bioinform..

[86]  Vincent Moulton,et al.  A comparison of RNA folding measures , 2005, BMC Bioinformatics.

[87]  H. Vaucheret,et al.  MicroRNAs: something important between the genes. , 2004, Current opinion in plant biology.

[88]  V. Ambros microRNAs Tiny Regulators with Great Potential , 2001, Cell.

[89]  Louise C. Showe,et al.  Bioinformatics Original Paper Combining Multi-species Genomic Data for Microrna Identification Using a Naı¨ve Bayes Classifier , 2022 .

[90]  Mike A. Steel,et al.  Metrics on RNA Secondary Structures , 2000, J. Comput. Biol..

[91]  T. Tuschl,et al.  Identification of Novel Genes Coding for Small Expressed RNAs , 2001, Science.

[92]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[93]  T. Tuschl,et al.  Identification of Tissue-Specific MicroRNAs from Mouse , 2002, Current Biology.

[94]  Baohong Zhang,et al.  Plant microRNA: a small regulatory molecule with big impact. , 2006, Developmental biology.

[95]  Oliver Hobert,et al.  A microRNA controlling left/right neuronal asymmetry in Caenorhabditis elegans , 2003, Nature.

[96]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[97]  G. Ruvkun,et al.  A uniform system for microRNA annotation. , 2003, RNA.

[98]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[99]  Danny Barash,et al.  Spectral Decomposition for the Search and Analysis of RNA Secondary Structure , 2004, J. Comput. Biol..

[100]  Jay Nelson,et al.  Identification and Characterization of Human Cytomegalovirus-Encoded MicroRNAs , 2005, Journal of Virology.

[101]  Mihaela Zavolan,et al.  Identification of Clustered Micrornas Using an Ab Initio Prediction Method , 2022 .

[102]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[103]  R. Breaker,et al.  Genetic Control by Metabolite‐Binding Riboswitches , 2003, Chembiochem : a European journal of chemical biology.

[104]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[105]  G. Church,et al.  Computational and experimental identification of C. elegans microRNAs. , 2003, Molecular cell.

[106]  Jeffrey E. Barrick,et al.  Metabolite-binding RNA domains are present in the genes of eukaryotes. , 2003, RNA.

[107]  P. Cossart,et al.  An RNA Thermosensor Controls Expression of Virulence Genes in Listeria monocytogenes , 2002, Cell.