A new algorithm for computing similarity between RNA structures

Abstract The primary structure of a ribonucleic acid (RNA) molecule is a sequence of nucleotides (bases) over the four-letter alphabet {A,C,G,U}. The secondary or tertiary structure of an RNA is a set of base-pairs (nucleotide pairs) which form bonds between A–U and C–G. For secondary structures, these bonds have been traditionally assumed to be one-to-one and non-crossing. We consider the edit distance between two RNA structures. This is a notion of similarity, introduced in [Proceedings of the Tenth Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 1645, Springer, Berlin, 1999, p. 281], between two RNA molecule structures taking into account the primary, the secondary and the tertiary structures. In general this problem is NP-hard for tertiary structures. In this paper, we consider this notion under some constraints. We present an algorithm and then show how to use this algorithm for practical applications.

[1]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[2]  Bin Ma,et al.  Computing Similarity between RNA Structures , 1999, CPM.

[3]  Ruth Nussinov,et al.  RNA secondary structures: comparison and determination of frequently recurring substructures by consensus , 1989, Comput. Appl. Biosci..

[4]  Kaizhong Zhang Computing similarity between RNA secondary structures , 1998, Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174).

[5]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[6]  R. Nussinov,et al.  Tree graphs of RNA secondary structures and their comparisons. , 1989, Computers and biomedical research, an international journal.

[7]  James W. Brown,et al.  The Ribonuclease P Database , 1994, Nucleic Acids Res..

[8]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[9]  M. Zuker On finding all suboptimal foldings of an RNA molecule. , 1989, Science.

[10]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[11]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[12]  Bruce A. Shapiro,et al.  An algorithm for comparing multiple RNA secondary structures , 1988, Comput. Appl. Biosci..

[13]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[14]  Florence Corpet,et al.  RNAlign program: alignment of RNA sequences using both primary and secondary structures , 1994, Comput. Appl. Biosci..

[15]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[16]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.