Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine

BackgroundMicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.ResultsA set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information.ConclusionThe local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.

[1]  G. Ruvkun,et al.  A uniform system for microRNA annotation. , 2003, RNA.

[2]  Michael Zuker,et al.  RNA Secondary Structure Prediction , 2007, Current protocols in nucleic acid chemistry.

[3]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[4]  Jorng-Tzong Horng,et al.  RNAMST: efficient and flexible approach for identifying RNA structural homologs , 2006, Nucleic Acids Res..

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  David P. Bartel,et al.  MicroRNAs: At the Root of Plant Development?1 , 2003, Plant Physiology.

[7]  P. Rouzé,et al.  Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[9]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[10]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[11]  Reiji Teramoto,et al.  Prediction of siRNA functionality using generalized string kernel and support vector machine , 2005, FEBS letters.

[12]  Alexander Gammerman,et al.  Sequence alignment kernel for recognition of promoter regions , 2003, Bioinform..

[13]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[14]  B. Cullen,et al.  Structural requirements for pre-microRNA binding and nuclear export by Exportin 5. , 2004, Nucleic acids research.

[15]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[16]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[17]  Simon Kasif,et al.  On the normalization of RNA equilibrium free energy to the length of the sequence. , 2003, Nucleic acids research.

[18]  V. Kim,et al.  MicroRNA precursors in motion: exportin-5 mediates their nuclear export. , 2004, Trends in cell biology.

[19]  C. Sander,et al.  Identification of microRNAs of the herpesvirus family , 2005, Nature Methods.

[20]  Eric Westhof,et al.  Single Processing Center Models for Human Dicer and Bacterial RNase III , 2004, Cell.

[21]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[22]  D. Bartel,et al.  Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. , 2004, Molecular cell.

[23]  V. Kim,et al.  The nuclear RNase III Drosha initiates microRNA processing , 2003, Nature.

[24]  Ron Shamir,et al.  Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..

[25]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[28]  Tao Jiang,et al.  RNA Secondary Structure Prediction , 2002 .

[29]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[30]  B. Reinhart,et al.  MicroRNAs in plants. , 2002, Genes & development.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[33]  K. Heller,et al.  Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. , 2003, Genome research.

[34]  Yves Van de Peer,et al.  Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences , 2004, Bioinform..

[35]  B. Patterson,et al.  Letter to the editor. , 2018, Journal of professional nursing : official journal of the American Association of Colleges of Nursing.

[36]  V. Kim,et al.  MicroRNA maturation: stepwise processing and subcellular localization , 2002, The EMBO journal.

[37]  Michael Q. Zhang,et al.  Current Topics in Computational Molecular Biology , 2002 .