Efficient RNA structure comparison algorithms

Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.

[1]  David H. Mathews,et al.  Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences , 2011, Bioinform..

[2]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[3]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[4]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[5]  Maxim A. Babenko,et al.  Computing Longest Common Substrings Via Suffix Arrays , 2008, CSR.

[6]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[7]  Aïda Ouangraoua,et al.  Local similarity between quotiented ordered trees , 2007, J. Discrete Algorithms.

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[10]  Rolf Backofen,et al.  GraphClust: alignment-free structural clustering of local RNA secondary structures , 2012, Bioinform..

[11]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[12]  T. Schlick,et al.  Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. , 2003, Nucleic acids research.

[13]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[14]  Yann Ponty,et al.  VARNA: Interactive drawing and editing of the RNA secondary structure , 2009, Bioinform..

[15]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[16]  Petr Cech,et al.  MultiSETTER: web server for multiple RNA structure comparison , 2015, BMC Bioinformatics.

[17]  Na Liu,et al.  A method for rapid similarity analysis of RNA secondary structures , 2006, BMC Bioinformatics.