Using a kernel density estimation based classifier to predict species-specific microRNA precursors

BackgroundMicroRNAs (miRNAs) are short non-coding RNA molecules participating in post-transcriptional regulation of gene expression. There have been many efforts to discover miRNA precursors (pre-miRNAs) over the years. Recently, ab initio approaches obtain more attention because that they can discover species-specific pre-miRNAs. Most ab initio approaches proposed novel features to characterize RNA molecules. However, there were fewer discussions on the associated classification mechanism in a miRNA predictor.ResultsThis study focuses on the classification algorithm for miRNA prediction. We develop a novel ab initio method, miR-KDE, in which most of the features are collected from previous works. The classification mechanism in miR-KDE is the relaxed variable kernel density estimator (RVKDE) that we have recently proposed. When compared to the famous support vector machine (SVM), RVKDE exploits more local information of the training dataset. MiR-KDE is evaluated using a training set consisted of only human pre-miRNAs to predict a benchmark collected from 40 species. The experimental results show that miR-KDE delivers favorable performance in predicting human pre-miRNAs and has advantages for pre-miRNAs from the genera taxonomically distant to humans.ConclusionWe use a novel classifier of which the characteristic of exploiting local information is particularly suitable to predict species-specific pre-miRNAs. This study also provides a comprehensive analysis from the view of classification mechanism. The good performance of miR-KDE encourages more efforts on the classification methodology as well as the feature extraction in miRNA prediction.

[1]  Hu Fu,et al.  Identifications of conserved 7-mers in 3'-UTRs and microRNAs in Drosophila , 2007, BMC Bioinformatics.

[2]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[3]  P. Rouzé,et al.  Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[5]  Yu Zong Chen,et al.  Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. , 2004, RNA.

[6]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[7]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[8]  Eugene Berezikov,et al.  Approaches to microRNA discovery , 2006, Nature Genetics.

[9]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[10]  Yen-Jen Oyang,et al.  Data classification with radial basis function networks based on a novel kernel density estimation algorithm , 2005, IEEE Transactions on Neural Networks.

[11]  B. Cullen Viruses and microRNAs , 2006, Nature Genetics.

[12]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[13]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[14]  P. Hraber,et al.  Estimating the Contributions of Selection and Self-Organization in RNA Secondary Structure , 1999, Journal of Molecular Evolution.

[15]  Ola R. Snøve,et al.  Reliable prediction of Drosha processing sites improves microRNA gene prediction. , 2007, Bioinformatics.

[16]  Namhee Kim,et al.  RAG: RNA-As-Graphs web resource , 2004, BMC Bioinformatics.

[17]  David W. Digby,et al.  mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. , 1999, Nucleic acids research.

[18]  Hanah Margalit,et al.  Clustering and conservation patterns of human microRNAs , 2005, Nucleic acids research.

[19]  Ron Shamir,et al.  Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..

[20]  Louise C. Showe,et al.  Bioinformatics Original Paper Combining Multi-species Genomic Data for Microrna Identification Using a Naı¨ve Bayes Classifier , 2022 .

[21]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[22]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[23]  James M. Pipas,et al.  SV40-encoded microRNAs regulate viral gene expression and reduce susceptibility to cytotoxic T cells , 2005, Nature.

[24]  Mike A. Steel,et al.  Metrics on RNA Secondary Structures , 2000, J. Comput. Biol..

[25]  D. Bartel,et al.  Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. , 2004, Molecular cell.

[26]  S. Cox,et al.  Evidence that miRNAs are different from other RNAs , 2006, Cellular and Molecular Life Sciences CMLS.

[27]  B. Reinhart,et al.  The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans , 2000, Nature.

[28]  Lin He,et al.  MicroRNAs: small RNAs with a big role in gene regulation , 2004, Nature Reviews Genetics.

[29]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[30]  V. Ambros The functions of animal microRNAs , 2004, Nature.

[31]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[32]  A. Adai,et al.  Computational prediction of miRNAs in Arabidopsis thaliana. , 2005, Genome research.

[33]  B. Patterson,et al.  Letter to the editor. , 2018, Journal of professional nursing : official journal of the American Association of Colleges of Nursing.

[34]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[35]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[36]  T. Schlick,et al.  RAG: RNA-As-Graphs database—concepts, analysis, and features , 1987 .

[37]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[38]  Carsten Wiuf,et al.  Ab Initio Identification of Human Micrornas Based on Structure Motifs Ab Initio Identification of Human Micrornas Based on Struc- Ture Motifs , 2007 .

[39]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[40]  Michel J. Weber New human and mouse microRNA genes found by homology search , 2004, The FEBS journal.

[41]  C. Burge,et al.  Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. , 2004, RNA.

[42]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[43]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[44]  K. Norman,et al.  MicroRNAs: expression, avoidance and subversion by vertebrate viruses , 2006, Nature Reviews Microbiology.

[45]  Daniel Gautheret,et al.  Profile-based detection of microRNA precursors in animal genomes , 2005, Bioinform..

[46]  Vincent Moulton,et al.  A comparison of RNA folding measures , 2005, BMC Bioinformatics.

[47]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[48]  G. Church,et al.  Computational and experimental identification of C. elegans microRNAs. , 2003, Molecular cell.

[49]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[50]  L. Gordon,et al.  The Gamma Function , 1994, Series and Products in the Development of Mathematics.

[51]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[52]  Mihaela Zavolan,et al.  Identification of Clustered Micrornas Using an Ab Initio Prediction Method , 2022 .