A Structure-Based Flexible Search Method for Motifs in RNA

The discovery of non-coding RNA (ncRNA) motifs and their role in regulating gene expression has recently attracted considerable attention. The goal is to discover these motifs in a sequence database. Current RNA motif search methods start from the primary sequence and only then take into account secondary structure considerations. One can think of developing a flexible structure-based motif search method that will filter datasets based on secondary structure first, while allowing extensive primary sequence factors and additional factors such as potential pseudoknots as constraints. Since different motifs vary in structure rigidity and in local sequence constraints, there is a need for algorithms and tools that can be fine-tuned according to the searched RNA motif, but differ in their approach from the RNAMotif descriptor language. We present an RNA motif search tool called STRMS (Structural RNA Motif Search), which takes as input the secondary structure of the query, including local sequence and structure constraints, and a target sequence database. It reports all occurrences of the query in the target, ranked by their similarity to the query, and produces an html file that displays graphical images of the predicted structures for both the query and the candidate hits. Our tool is flexible and takes into account a large number of sequence options and existence of potential pseudoknots as dictated by specific queries. Our approach combines pre-folding and an O(m n) RNA pattern matching algorithm based on subtree homeomorphism for ordered, rooted trees. An O(n(2) log n) extension is described that allows the search engine to take into account the pseudoknots typical to riboswitches. We employed STRMS in search for both new and known RNA motifs (riboswitches and tRNAs) in large target databases. Our results point to a number of additional purine bacterial riboswitch candidates in newly sequenced bacteria, and demonstrate high sensitivity on known riboswitches and tRNAs. Code and data are available at www.cs.bgu.ac.il/vaksler/STRMS.

[1]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[2]  R. Nussinov,et al.  Tree graphs of RNA secondary structures and their comparisons. , 1989, Computers and biomedical research, an international journal.

[3]  M. Zuker On finding all suboptimal foldings of an RNA molecule. , 1989, Science.

[4]  Mikhail J. Atallah,et al.  Efficient Parallel Algorithms for String Editing and Related Problems , 1990, SIAM J. Comput..

[5]  G. Mauri,et al.  An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences , 2008, Journal of Computer Science and Technology.

[6]  Jeffrey E. Barrick,et al.  Riboswitches Control Fundamental Biochemical Pathways in Bacillus subtilis and Other Bacteria , 2003, Cell.

[7]  Jun Hu,et al.  A method for aligning RNA secondary structures and its application to RNA motif detection , 2005, BMC Bioinformatics.

[8]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[9]  Moon-Jung Chung,et al.  O(n^(2.55)) Time Algorithms for the Subgraph Homeomorphism Problem on Trees , 1987, J. Algorithms.

[10]  B. Haas,et al.  Searching Genomes for Noncoding RNA Using FastR , 2005, TCBB.

[11]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[12]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[13]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[14]  R. Breaker,et al.  Genetic Control by Metabolite‐Binding Riboswitches , 2003, Chembiochem : a European journal of chemical biology.

[15]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[16]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[17]  Jeanette P. Schmidt,et al.  All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings , 1998, SIAM J. Comput..

[18]  S. Eddy Noncoding RNA genes. , 1999, Current opinion in genetics & development.

[19]  M. Waterman Secondary Structure of Single-Stranded Nucleic Acidst , 1978 .

[20]  E. Nudler,et al.  The riboswitch control of bacterial metabolism. , 2004, Trends in biochemical sciences.

[21]  Bruce A. Shapiro,et al.  An algorithm for comparing multiple RNA secondary structures , 1988, Comput. Appl. Biosci..

[22]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[23]  David A Case,et al.  A novel method for finding tRNA genes. , 2003, RNA.

[24]  D. Matula Subtree Isomorphism in O(n5/2) , 1978 .

[25]  Zasha Weinberg,et al.  CMfinder - a covariance model based RNA motif finding algorithm , 2006, Bioinform..

[26]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[27]  Steven W. Reyner,et al.  An Analysis of a Good Algorithm for the Subtree Problem , 1977, SIAM J. Comput..

[28]  Kaizhong Zhang Computing similarity between RNA secondary structures , 1998, Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174).

[29]  Ron Y. Pinter,et al.  Approximate Labelled Subtree Homeomorphism , 2004, CPM.

[30]  Klara Kedem,et al.  STR2: A structure to string approach for locating G-box riboswitch shapes in pre-selected genes , 2004, Silico Biol..

[31]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[32]  P. Schuster,et al.  Statistics of RNA secondary structures , 1993, Biopolymers.

[33]  P. Schuster,et al.  Complete suboptimal folding of RNA and the stability of secondary structures. , 1999, Biopolymers.

[34]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[35]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[36]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.