TargetSpy: a supervised machine learning approach for microRNA target prediction

BackgroundVirtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites.ResultsWe developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms.ConclusionOnly a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[3]  Doron Betel,et al.  The microRNA.org resource: targets and expression , 2007, Nucleic Acids Res..

[4]  V. Ambros The functions of animal microRNAs , 2004, Nature.

[5]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Mihaela Zavolan,et al.  Inference of miRNA targets using evolutionary conservation and pathway analysis , 2007, BMC Bioinformatics.

[8]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[9]  R. Russell,et al.  Animal MicroRNAs Confer Robustness to Gene Expression and Have a Significant Impact on 3′UTR Evolution , 2005, Cell.

[10]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[11]  A. Hatzigeorgiou,et al.  TarBase: A comprehensive database of experimentally supported animal microRNA targets. , 2005, RNA.

[12]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[13]  William H Press,et al.  Human microRNAs target a functionally distinct population of genes with AT-rich 3′ UTRs , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[15]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[16]  E. Lai Micro RNAs are complementary to 3′ UTR sequence motifs that mediate negative post-transcriptional regulation , 2002, Nature Genetics.

[17]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[18]  Oliver Hobert,et al.  Perfect seed pairing is not a generally reliable predictor for miRNA-target interactions , 2006, Nature Structural &Molecular Biology.

[19]  Deyu Meng,et al.  Fast and Efficient Strategies for Model Selection of Gaussian Support Vector Machine , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Yvonne Tay,et al.  A Pattern-Based Method for the Identification of MicroRNA Binding Sites and Their Corresponding Heteroduplexes , 2006, Cell.

[21]  R. Russell,et al.  Principles of MicroRNA–Target Recognition , 2005, PLoS biology.

[22]  N. Rajewsky,et al.  Widespread changes in protein synthesis induced by microRNAs , 2008, Nature.

[23]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[24]  Nikolaus Rajewsky,et al.  Computational identification of microRNA targets , 2004, Genome Biology.

[25]  N. Rajewsky,et al.  Natural selection on human microRNA binding sites inferred from SNP data , 2006, Nature Genetics.

[26]  M. Kiebler,et al.  Faculty Opinions recommendation of Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. , 2009 .

[27]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[28]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[29]  Yong Zhao,et al.  Serum response factor regulates a muscle-specific microRNA that targets Hand2 during cardiogenesis , 2005, Nature.

[30]  Byoung-Tak Zhang,et al.  miTarget: microRNA target gene prediction using a support vector machine , 2006, BMC Bioinformatics.

[31]  D. Bartel,et al.  The impact of microRNAs on protein output , 2008, Nature.

[32]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[33]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[34]  Nectarios Koziris,et al.  DIANA-microT web server: elucidating microRNA functions through target prediction , 2009, Nucleic Acids Res..

[35]  Rolf Backofen,et al.  IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions , 2008, Bioinform..

[36]  J. Kitzman,et al.  Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. , 2007, RNA.

[37]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[38]  John G Doench,et al.  Specificity of microRNA target selection in translational repression. , 2004, Genes & development.

[39]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[40]  Anton J. Enright,et al.  Human MicroRNA Targets , 2004, PLoS biology.

[41]  Xiaowei Wang,et al.  Sequence analysis Prediction of both conserved and nonconserved microRNA targets in animals , 2007 .

[42]  Peter F. Stadler,et al.  Thermodynamics of RNA-RNA Binding , 2006, German Conference on Bioinformatics.

[43]  Sanghamitra Bandyopadhyay,et al.  TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples , 2009, Bioinform..

[44]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[45]  BMC Bioinformatics , 2005 .

[46]  Louise C. Showe,et al.  Naïve Bayes for microRNA target predictions - machine learning for microRNA targets , 2007, Bioinform..

[47]  Michael Kertesz,et al.  The role of site accessibility in microRNA target recognition , 2007, Nature Genetics.