Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms.

We present a new microRNA target prediction algorithm called TargetBoost, and show that the algorithm is stable and identifies more true targets than do existing algorithms. TargetBoost uses machine learning on a set of validated microRNA targets in lower organisms to create weighted sequence motifs that capture the binding characteristics between microRNAs and their targets. Existing algorithms require candidates to have (1) near-perfect complementarity between microRNAs' 5' end and their targets; (2) relatively high thermodynamic duplex stability; (3) multiple target sites in the target's 3' UTR; and (4) evolutionary conservation of the target between species. Most algorithms use one of the two first requirements in a seeding step, and use the three others as filters to improve the method's specificity. The initial seeding step determines an algorithm's sensitivity and also influences its specificity. As all algorithms may add filters to increase the specificity, we propose that methods should be compared before such filtering. We show that TargetBoost's weighted sequence motif approach is favorable to using both the duplex stability and the sequence complementarity steps. (TargetBoost is available as a Web tool from http://www.interagon.com/demo/.).

[1]  Ola Snøve,et al.  A comparison of siRNA efficacy predictors. , 2004, Biochemical and biophysical research communications.

[2]  Eric J Wagner,et al.  Both natural and designed micro RNAs can inhibit the expression of cognate mRNAs when expressed in human cells. , 2002, Molecular cell.

[3]  Donald E. Knuth,et al.  backus normal form vs. Backus Naur form , 1964, CACM.

[4]  Neil R. Smalheiser,et al.  A population-based statistical approach identifies parameters characteristic of human microRNA-mRNA interactions , 2004, BMC Bioinformatics.

[5]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[6]  T. Tuschl,et al.  Identification of Novel Genes Coding for Small Expressed RNAs , 2001, Science.

[7]  R. Plasterk,et al.  Substrate requirements for let-7 function in the developing zebrafish embryo. , 2004, Nucleic acids research.

[8]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[9]  C A Roe,et al.  Statistical Comparison of Two ROC-curve Estimates Obtained from Partially-paired Datasets , 1998, Medical decision making : an international journal of the Society for Medical Decision Making.

[10]  B. Reinhart,et al.  The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans , 2000, Nature.

[11]  V. Ambros,et al.  The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN-14 protein synthesis after the initiation of translation. , 1999, Developmental biology.

[12]  V. Ambros,et al.  The Cold Shock Domain Protein LIN-28 Controls Developmental Timing in C. elegans and Is Regulated by the lin-4 RNA , 1997, Cell.

[13]  G. Ruvkun,et al.  Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans , 1993, Cell.

[14]  D. Bartel,et al.  MicroRNA-Directed Cleavage of HOXB8 mRNA , 2004, Science.

[15]  V. Ambros,et al.  An Extensive Class of Small RNAs in Caenorhabditis elegans , 2001, Science.

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  R. Giegerich,et al.  Fast and effective prediction of microRNA/target duplexes. , 2004, RNA.

[18]  Pål Sætrom,et al.  Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming , 2004, Bioinform..

[19]  Tyra G. Wolfsberg,et al.  Short interfering RNAs can induce unexpected and divergent changes in the levels of untargeted proteins in mammalian cells , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[21]  Anton J. Enright,et al.  Human MicroRNA Targets , 2004, PLoS biology.

[22]  A. Hatzigeorgiou,et al.  A combined computational-experimental approach predicts human microRNA targets. , 2004, Genes & development.

[23]  R. Russell,et al.  bantam Encodes a Developmentally Regulated microRNA that Controls Cell Proliferation and Regulates the Proapoptotic Gene hid in Drosophila , 2003, Cell.

[24]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[25]  John G Doench,et al.  Specificity of microRNA target selection in translational repression. , 2004, Genes & development.

[26]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[27]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[28]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[29]  Phillip A Sharp,et al.  siRNAs can function as miRNAs , 2003 .

[30]  E. Lai Predicting and validating microRNA targets , 2004, Genome Biology.

[31]  Olaf René Birkeland,et al.  A recursive MISD architecture for pattern matching , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[32]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[33]  Nikolaus Rajewsky,et al.  Computational identification of microRNA targets , 2004, Genome Biology.

[34]  L. Lim,et al.  An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans , 2001, Science.

[35]  B. Reinhart,et al.  Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA , 2000, Nature.

[36]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[37]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[38]  Anindya Dutta,et al.  Small RNAs with Imperfect Match to Endogenous mRNA Repress Translation , 2003, Journal of Biological Chemistry.

[39]  Martin Tabler,et al.  Developmental defects by antisense-mediated inactivation of micro-RNAs 2 and 13 in Drosophila and the identification of putative target genes. , 2003, Nucleic acids research.

[40]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[41]  Julius Brennecke,et al.  Identification of Drosophila MicroRNA Targets , 2003, PLoS biology.