MiRTif: a support vector machine-based microRNA target interaction filter

BackgroundMicroRNAs (miRNAs) are a set of small non-coding RNAs serving as important negative gene regulators. In animals, miRNAs turn down protein translation by binding to the 3' UTR regions of target genes with imperfect complementary pairing. The identification of microRNA targets has become one of the major challenges of miRNA research. Bioinformatics investigations on miRNA target have resulted in a number of target prediction tools. Although these tools are capable of predicting hundreds of targets for a given miRNA, many of them suffer from high false positive rates, indicating the need for a post-processing filter for the predicted targets. Once trained with experimentally validated true and false targets, machine learning methods appear to be ideal approaches to distinguish the true targets from the false ones.ResultsWe present a miRNA target filtering system named MiRTif (miRNA:target interaction filter). The system is a support vector machine (SVM) classifier trained with 195 positive and 38 negative miRNA:target interaction pairs, all experimentally validated. Each miRNA:target interaction pair is divided into a seed and a non-seed region. The encoded feature vector contains various k-gram frequencies in the seed, the non-seed and the entire regions. Informative features are selected based on their discriminating abilities. Prediction accuracies are assessed using 10-fold cross-validation experiments. Our system achieves AUC (area under the ROC curve) of 0.86, sensitivity of 83.59%, and specificity of 73.68%. More importantly, the system correctly predicts majority of the false positive miRNA:target interactions (28 out of 38). The possibility of over-fitting due to the relatively small negative sample set has also been investigated using a set of non-validated and randomly selected targets (from miRBase).ConclusionMiRTif is designed as a post-processing filter that takes miRNA:target interactions predicted by other target prediction softwares such as TargetScanS, PicTar and miRanda as inputs, and determines how likely the given interaction is a real or a pseudo one. MiRTif can be accessed from http://bsal.ym.edu.tw/mirtif.

[1]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[2]  Huiqing Liu,et al.  Data Mining Tools for Biological Sequences , 2003, J. Bioinform. Comput. Biol..

[3]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[4]  Arun Krishnan,et al.  pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties , 2005, BMC Bioinformatics.

[5]  F. Slack,et al.  The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. , 2000, Molecular cell.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  B. Reinhart,et al.  The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans , 2000, Nature.

[8]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  G. Ruvkun,et al.  Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans , 1993, Cell.

[11]  B. Reinhart,et al.  Prediction of Plant MicroRNA Targets , 2002, Cell.

[12]  V. Ambros,et al.  The Cold Shock Domain Protein LIN-28 Controls Developmental Timing in C. elegans and Is Regulated by the lin-4 RNA , 1997, Cell.

[13]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.

[14]  R. Giegerich,et al.  Fast and effective prediction of microRNA/target duplexes. , 2004, RNA.

[15]  B. Reinhart,et al.  A biochemical framework for RNA silencing in plants. , 2003, Genes & development.

[16]  Xiaolong Wang,et al.  Sequence analysis Application of latent semantic analysis to protein remote homology detection , 2006 .

[17]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[18]  V. Ambros The functions of animal microRNAs , 2004, Nature.

[19]  Y. Freund,et al.  Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[20]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[21]  John G Doench,et al.  Specificity of microRNA target selection in translational repression. , 2004, Genes & development.

[22]  George Karypis,et al.  Profile-based direct kernels for remote homology detection and fold recognition , 2005, Bioinform..

[23]  Kuo-Bin Li,et al.  Profiling MicroRNA Expression in Hepatocellular Carcinoma Reveals MicroRNA-224 Up-regulation and Apoptosis Inhibitor-5 as a MicroRNA-224-specific Target* , 2008, Journal of Biological Chemistry.

[24]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[25]  LinLei,et al.  Application of latent semantic analysis to protein remote homology detection , 2006 .

[26]  A. Hatzigeorgiou,et al.  A guide through present computational approaches for the identification of mammalian microRNA targets , 2006, Nature Methods.

[27]  Byoung-Tak Zhang,et al.  miTarget: microRNA target gene prediction using a support vector machine , 2006, BMC Bioinformatics.

[28]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[29]  Mong-Li Lee,et al.  Identification of MicroRNA Precursors via SVM , 2006, APBC.

[30]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[31]  Louise C. Showe,et al.  Naïve Bayes for microRNA target predictions - machine learning for microRNA targets , 2007, Bioinform..

[32]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[33]  Boqin Qiang,et al.  Improving the prediction of human microRNA target genes by using ensemble algorithm , 2007, FEBS letters.

[34]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[35]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[36]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[37]  Eric C Lai,et al.  microRNAs: Runts of the Genome Assert Themselves , 2003, Current Biology.

[38]  V. Ambros,et al.  The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN-14 protein synthesis after the initiation of translation. , 1999, Developmental biology.

[39]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[40]  A. Hatzigeorgiou,et al.  A combined computational-experimental approach predicts human microRNA targets. , 2004, Genes & development.

[41]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[42]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[43]  Ke Wang,et al.  Profile-based string kernels for remote homology detection and motif extraction , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[44]  Eun-Young Choi,et al.  The C. elegans microRNA let-7 binds to imperfect let-7 complementary sites from the lin-41 3'UTR. , 2004, Genes & development.

[45]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[46]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[47]  E. Moss,et al.  Two genetic circuits repress the Caenorhabditis elegans heterochronic gene lin-28 after translation initiation. , 2002, Developmental biology.

[48]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[49]  Yu Zong Chen,et al.  Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. , 2004, RNA.

[50]  James R. Brown,et al.  A computational view of microRNAs and their targets. , 2005, Drug discovery today.

[51]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[52]  Anton J. Enright,et al.  Human MicroRNA Targets , 2004, PLoS biology.

[53]  Y. Li,et al.  Incorporating structure to predict microRNA targets. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[54]  A. Hatzigeorgiou,et al.  TarBase: A comprehensive database of experimentally supported animal microRNA targets. , 2005, RNA.

[55]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[56]  Charles X. Ling,et al.  AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[57]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.