A k-mer scheme to predict piRNAs and characterize locust piRNAs

Motivation: Identifying piwi-interacting RNAs (piRNAs) of non-model organisms is a difficult and unsolved problem because piRNAs lack conservative secondary structure motifs and sequence homology in different species. Results: In this article, a k-mer scheme is proposed to identify piRNA sequences, relying on the training sets from non-piRNA and piRNA sequences of five model species sequenced: rat, mouse, human, fruit fly and nematode. Compared with the existing ‘static’ scheme based on the position-specific base usage, our novel ‘dynamic’ algorithm performs much better with a precision of over 90% and a sensitivity of over 60%, and the precision is verified by 5-fold cross-validation in these species. To test its validity, we use the algorithm to identify piRNAs of the migratory locust based on 603 607 deep-sequenced small RNA sequences. Totally, 87 536 piRNAs of the locust are predicted, and 4426 of them matched with existing locust transposons. The transcriptional difference between solitary and gregarious locusts was described. We also revisit the position-specific base usage of piRNAs and find the conservation in the end of piRNAs. Therefore, the method we developed can be used to identify piRNAs of non-model organisms without complete genome sequences. Availability: The web server for implementing the algorithm and the software code are freely available to the academic community at http://59.79.168.90/piRNA/index.php. Contact: lkang@ioz.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Doron Betel,et al.  Computational Analysis of Mouse piRNA Sequence and Biogenesis , 2007, PLoS Comput. Biol..

[2]  Alessandro Guffanti,et al.  An Ariadne's thread to the identification and annotation of noncoding RNAs in eukaryotes , 2009, Briefings Bioinform..

[3]  S Karlin,et al.  Comparisons of eukaryotic genomic sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[4]  S Karlin,et al.  Heterogeneity of genomes: measures and values. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Haifan Lin,et al.  An epigenetic activation role of Piwi and a Piwi-associated piRNA in Drosophila melanogaster , 2007, Nature.

[6]  Eugene Berezikov,et al.  Piwi and piRNAs act upstream of an endogenous siRNA pathway to suppress Tc3 transposon mobility in the Caenorhabditis elegans germline. , 2008, Molecular cell.

[7]  N. Lau,et al.  The coming of age for Piwi proteins. , 2007, Molecular cell.

[8]  C. Sander,et al.  A novel class of small RNAs bind to MILI protein in mouse testes , 2006, Nature.

[9]  Toshiaki Watanabe,et al.  Identification and characterization of two novel classes of small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes and germline small RNAs in testes. , 2006, Genes & development.

[10]  Christopher M. Player,et al.  Large-Scale Sequencing Reveals 21U-RNAs and Additional MicroRNAs and Endogenous siRNAs in C. elegans , 2006, Cell.

[11]  N. Lau,et al.  A Broadly Conserved Pathway Generates 3′UTR-Directed Primary piRNAs , 2009, Current Biology.

[12]  Martin Madera,et al.  Improving protein secondary structure prediction using a simple k-mer model , 2010, Bioinform..

[13]  Zissimos Mourelatos,et al.  Mouse Piwi-interacting RNAs are 2′-O-methylated at their 3′ termini , 2007, Nature Structural &Molecular Biology.

[14]  Haifan Lin,et al.  A novel class of evolutionarily conserved genes defined by piwi are essential for stem cell self-renewal. , 1998, Genes & development.

[15]  Haifan Lin,et al.  A novel class of small RNAs in mouse spermatogenic cells. , 2006, Genes & development.

[16]  Ding-Shinn Chen,et al.  Higher cut‐off index value of immunoglobulin M antibody to hepatitis B core antigen in Taiwanese patients with hepatitis B , 2006, Journal of gastroenterology and hepatology.

[17]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[18]  E. Candolfi,et al.  Determination of a new cut-off value for the diagnosis of congenital toxoplasmosis by detection of specific IgM in an enzyme immunoassay , 2005, European Journal of Clinical Microbiology and Infectious Diseases.

[19]  Ravi Sachidanandam,et al.  A germline-specific class of small RNAs binds mammalian Piwi proteins , 2006, Nature.

[20]  Yi Zhao,et al.  NONCODE: an integrated knowledge database of non-coding RNAs , 2004, Nucleic Acids Res..

[21]  havelu Meenakshisundaram,et al.  Existence of snoRNA, microRNA, piRNA characteristics in a novel non-coding RNA: x-ncRNA and its biological implication in Homo sapiens , 2009 .

[22]  Vladimir Gvozdev,et al.  A Distinct Small RNA Pathway Silences Selfish Genetic Elements in the Germline , 2006, Science.

[23]  Sai Lakshmi Subramanian,et al.  piRNABank: a web resource on classified and clustered Piwi-interacting RNAs , 2007, Nucleic Acids Res..

[24]  J. Oliver,et al.  Dinucleotides and G + C content in human genes: Opposite behavior of GpG, GpC, and TpC at II-III codon positions and in introns , 1993, Journal of Molecular Evolution.

[25]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Haifan Lin,et al.  The biogenesis and function of PIWI proteins and piRNAs: progress and prospect. , 2009, Annual review of cell and developmental biology.

[27]  S Karlin,et al.  Compositional differences within and between eukaryotic genomes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Le Kang,et al.  Characterization and comparative profiling of the small RNA transcriptomes in two phases of locust , 2009, Genome Biology.

[29]  Manolis Kellis,et al.  Discrete Small RNA-Generating Loci as Master Regulators of Transposon Activity in Drosophila , 2007, Cell.

[30]  Eugene Berezikov,et al.  A Role for Piwi and piRNAs in Germ Cell Maintenance and Transposon Silencing in Zebrafish , 2007, Cell.