Prediction of piRNAs using transposon interaction and a support vector machine

BackgroundPiwi-interacting RNAs (piRNAs) are a class of small non-coding RNA primarily expressed in germ cells that can silence transposons at the post-transcriptional level. Accurate prediction of piRNAs remains a significant challenge.ResultsWe developed a program for piRNA annotation (Piano) using piRNA-transposon interaction information. We downloaded 13,848 Drosophila piRNAs and 261,500 Drosophila transposons. The piRNAs were aligned to transposons with a maximum of three mismatches. Then, piRNA-transposon interactions were predicted by RNAplex. Triplet elements combining structure and sequence information were extracted from piRNA-transposon matching/pairing duplexes. A support vector machine (SVM) was used on these triplet elements to classify real and pseudo piRNAs, achieving 95.3 ± 0.33% accuracy and 96.0 ± 0.5% sensitivity. The SVM classifier can be used to correctly predict human, mouse and rat piRNAs, with overall accuracy of 90.6%. We used Piano to predict piRNAs for the rice stem borer, Chilo suppressalis, an important rice insect pest that causes huge yield loss. As a result, 82,639 piRNAs were predicted in C. suppressalis.ConclusionsPiano demonstrates excellent piRNA prediction performance by using both structure and sequence features of transposon-piRNAs interactions. Piano is freely available to the academic community at http://ento.njau.edu.cn/Piano.html.

[1]  N. Lau,et al.  Characterization of the piRNA Complex from Rat Testes , 2006, Science.

[2]  David Rosenkranz,et al.  proTRAC - a software for probabilistic piRNA cluster detection, visualization and analysis , 2012, BMC Bioinformatics.

[3]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[4]  J. Couzin Small RNAs Make Big Splash , 2002, Science.

[5]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[6]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[7]  Haifan Lin,et al.  The biogenesis and function of PIWI proteins and piRNAs: progress and prospect. , 2009, Annual review of cell and developmental biology.

[8]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[9]  Eugene Berezikov,et al.  A Role for Piwi and piRNAs in Germ Cell Maintenance and Transposon Silencing in Zebrafish , 2007, Cell.

[10]  Ron Shamir,et al.  Accurate identification of alternatively spliced exons using support vector machine , 2005, Bioinform..

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  A. Smit Interspersed repeats and other mementos of transposable elements in mammalian genomes. , 1999, Current opinion in genetics & development.

[13]  J. Mattick The Functional Genomics of Noncoding RNA , 2005, Science.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Petr Svoboda,et al.  miRNA, siRNA, piRNA: Knowns of the unknown , 2008, RNA biology.

[16]  Ravi Sachidanandam,et al.  A germline-specific class of small RNAs binds mammalian Piwi proteins , 2006, Nature.

[17]  Yi Zhang,et al.  A k-mer scheme to predict piRNAs and characterize locust piRNAs , 2011, Bioinform..

[18]  Wei Wu,et al.  NONCODEv4: exploring the world of long non-coding RNA genes , 2013, Nucleic Acids Res..

[19]  J. Boeke,et al.  Active transposition in genomes. , 2012, Annual review of genetics.

[20]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[21]  J. Giovannoni,et al.  Uniting Germline and Stem Cells : The Function of Piwi Proteins and the piRNA Pathway in Diverse Organisms , 2015 .

[22]  Manolis Kellis,et al.  Discrete Small RNA-Generating Loci as Master Regulators of Transposon Activity in Drosophila , 2007, Cell.

[23]  Kevin C. Chen,et al.  Human piRNAs Are Under Selection in Africans and Repress Transposable Elements , 2011, Molecular biology and evolution.

[24]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[25]  Hakim Tafer,et al.  RNAplex: a fast tool for RNA-RNA interaction search , 2008, Bioinform..

[26]  Donald Kennedy,et al.  Breakthrough of the Year , 2007, Science.

[27]  Hui Xiao,et al.  NONCODE v3.0: integrative annotation of long noncoding RNAs , 2011, Nucleic Acids Res..

[28]  Kuniaki Saito,et al.  Gene silencing mechanisms mediated by Aubergine piRNA complexes in Drosophila male gonad. , 2007, RNA.

[29]  Doron Betel,et al.  Computational Analysis of Mouse piRNA Sequence and Biogenesis , 2007, PLoS Comput. Biol..

[30]  Antonio Rinaldi,et al.  iMir: An integrated pipeline for high-throughput analysis of small non-coding RNA data obtained by smallRNA-Seq , 2013, BMC Bioinformatics.

[31]  Haifan Lin,et al.  An epigenetic activation role of Piwi and a Piwi-associated piRNA in Drosophila melanogaster , 2007, Nature.

[32]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[33]  Haifan Lin,et al.  A novel class of small RNAs in mouse spermatogenic cells. , 2006, Genes & development.

[34]  Qinghua Shi,et al.  mirTools 2.0 for non-coding RNA discovery, profiling, and functional annotation based on high-throughput sequencing , 2013, RNA biology.

[35]  Sun Kim,et al.  piClust: A density based piRNA clustering algorithm , 2014, Comput. Biol. Chem..

[36]  J. Claverie Fewer Genes, More Noncoding RNA , 2005, Science.

[37]  Mark Menor,et al.  Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery , 2013, BMC Genomics.