Bioinformatics Advance Access published September 10, 2009 De Novo Computational Prediction of Non-coding RNA Genes in

Motivation: The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues. Results: We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to decoys. Using these features, we have trained a neural network-based classifier and have applied it to Escherichia coli and Sulfolobus solfataricus for genome-wide prediction of ncRNAs. Our method has an average prediction sensitivity and specificity of 68% and 70%, respectively, for identifying windows with potential for ncRNA genes in E.coli. By combining windows of different sizes and using positional filtering strategies, we predicted 601 candidate ncRNAs and recovered 41% of known ncRNAs in E.coli. We experimentally investigated six novel candidates using Northern blot analysis and found expression of three candidates: one represents a potential new ncRNA, one is associated with stable mRNA decay intermediates and one is a case of either a potential riboswitch or transcription attenuator involved in the regulation of cell division. In general, our approach enables the identification of both cis- and trans-acting ncRNAs in partially or completely sequenced microbial genomes without requiring homology or structural conservation. Availability: The source code and results are available at http://csbl.bmb.uga.edu/publications/materials/tran/. Contact: xyn@bmb.uga.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  R. Griffey,et al.  A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome. , 2002, Bio Systems.

[2]  M. Huynen,et al.  Assessing the reliability of RNA folding using statistical mechanics. , 1997, Journal of molecular biology.

[3]  Ying Xu,et al.  Operon prediction in Pyrococcus furiosus , 2006 .

[4]  C. Lawrence,et al.  Clustering of RNA secondary structures with application to messenger RNAs. , 2006, Journal of molecular biology.

[5]  Jörg Vogel,et al.  Experimental approaches to identify non-coding RNAs , 2006, Nucleic acids research.

[6]  S. Salzberg,et al.  Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake , 2007, Genome Biology.

[7]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[8]  K. Rudd,et al.  Novel intergenic repeats of Escherichia coli K-12. , 1999, Research in microbiology.

[9]  Christophe Pichon,et al.  Intergenic sequence inspector: searching and identifying bacterial RNAs , 2003, Bioinform..

[10]  D. Turner,et al.  Improved predictions of secondary structures for RNA. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[11]  S. Eddy,et al.  Computational identification of noncoding RNAs in E. coli by comparative genomics , 2001, Current Biology.

[12]  Yi Zhao,et al.  NONCODE: an integrated knowledge database of non-coding RNAs , 2004, Nucleic Acids Res..

[13]  C. Lawrence,et al.  RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. , 2005, RNA.

[14]  H. Margalit,et al.  Novel small RNA-encoding genes in the intergenic regions of Escherichia coli , 2001, Current Biology.

[15]  Diego di Bernardo,et al.  ddbRNA: detection of conserved secondary structures in multiple alignments , 2003, Bioinform..

[16]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[17]  S. Eddy,et al.  Noncoding RNA genes identified in AT-rich hyperthermophiles , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  I-Min A. Dubchak,et al.  A computational approach to identify genes for functional RNAs in genomic sequences. , 2001, Nucleic acids research.

[19]  G. Storz,et al.  Target prediction for small, noncoding RNAs in bacteria , 2006, Nucleic acids research.

[20]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[21]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[22]  S. Gottesman Micros for microbes: non-coding regulatory RNAs in bacteria. , 2005, Trends in genetics : TIG.

[23]  Sidney R. Kushner,et al.  Rho-independent transcription terminators inhibit RNase P processing of the secG leuU and metT tRNA polycistronic transcripts in Escherichia coli , 2007, Nucleic acids research.

[24]  Chris H. Q. Ding,et al.  PSoL: a positive sample only learning algorithm for finding non-coding RNA genes , 2006, Bioinform..

[25]  Yu-Ling Shih,et al.  The MreB and Min cytoskeletal‐like systems play independent roles in prokaryotic polar differentiation , 2005, Molecular microbiology.

[26]  C. Lawrence,et al.  A statistical sampling algorithm for RNA secondary structure prediction. , 2003, Nucleic acids research.

[27]  Sean R Eddy,et al.  How do RNA folding algorithms work? , 2004, Nature Biotechnology.

[28]  S. R. Kushner,et al.  Polyadenylylation helps regulate mRNA decay in Escherichia coli. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[29]  T. Rognes,et al.  Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming , 2005, Nucleic acids research.

[30]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[31]  P. Clote,et al.  Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. , 2005, RNA.

[32]  B. Berger,et al.  MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Zhihua Zhang,et al.  Conservation analysis of small RNA genes in Escherichia coli , 2004, Bioinform..

[34]  T. Inada,et al.  Mechanism of the down‐regulation of cAMP receptor protein by glucose in Escherichia coli: role of autoregulation of the crp gene. , 1994, The EMBO journal.

[35]  Huiqing Liu,et al.  RNACluster: An integrated tool for RNA secondary structure comparison and clustering , 2008, J. Comput. Chem..

[36]  R. Bone Discovery , 1938, Nature.

[37]  J. Livny,et al.  sRNAPredict: an integrative computational approach to identify sRNAs in bacterial genomes , 2005, Nucleic acids research.

[38]  S. Altschul,et al.  Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. , 1985, Molecular biology and evolution.

[39]  Iwao Ohtsu,et al.  Transcriptional Analysis of the Escherichia coli mreBCD Genes Responsible for Morphogenesis and Chromosome Segregation , 2006, Bioscience, biotechnology, and biochemistry.

[40]  A. Hüttenhofer,et al.  RNomics: identification and function of small, non-messenger RNAs. , 2002, Current opinion in chemical biology.

[41]  Jan Barciszewski,et al.  Noncoding Rna Transcripts , 2002 .

[42]  P. Schattner Searching for RNA genes using base-composition statistics. , 2002, Nucleic acids research.

[43]  Stanley N Cohen,et al.  Global analysis of Escherichia coli RNA degradosome function using DNA microarrays. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Ronny Lorenz,et al.  The Vienna RNA Websuite , 2008, Nucleic Acids Res..

[45]  J. Bachellerie,et al.  Archaeal homologs of eukaryotic methylation guide small nucleolar RNAs: lessons from the Pyrococcus genomes. , 2000, Journal of molecular biology.

[46]  S. R. Kushner,et al.  Initiation of tRNA maturation by RNase E is essential for cell viability in E. coli. , 2002, Genes & development.

[47]  G. Storz,et al.  Identification of novel small RNAs using comparative genomics and microarrays. , 2001, Genes & development.

[48]  Chi Yu Chan,et al.  Boltzmann ensemble features of RNA secondary structures: a comparative analysis of biological RNA sequences and random shuffles , 2007, Journal of mathematical biology.

[49]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[50]  A. Krogh,et al.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. , 1999, Nucleic acids research.

[51]  Pontus Larsson,et al.  De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring. , 2008, Genome research.

[52]  M. Tomita,et al.  Prediction of non-coding and antisense RNA genes in Escherichia coli with Gapped Markov Model. , 2006, Gene.

[53]  Vincent Moulton,et al.  A comparison of RNA folding measures , 2005, BMC Bioinformatics.