Local similarity in RNA secondary structures

We present a systematic treatment of alignment distance and local similarity algorithms on trees and forests. We build upon the tree alignment algorithm for ordered trees given by Jiang et. al (1995) and extend it to calculate local forest alignments, which is essential for finding local similar regions in RNA secondary structures. The time complexity of our algorithm is O(/F/sub 1///spl middot//F/sub 2//)/spl middot/deg(F/sub 1/)/spl middot/deg(F/sub 2/)/spl middot/(deg(F/sub 1/)+deg(F/sub 2/)) where /Fi/ is the number of nodes in forest Fi and deg(Fi) is the degree of Fi. We provide carefully engineered dynamic programming implementations using dense, two-dimensional tables which considerably reduces the space requirement. We suggest a new representation of RNA secondary structures as forests that allow reasonable scoring of edit operations on RNA secondary structures. The comparison of RNA secondary structures is facilitated by a new visualization technique for RNA secondary structure alignments. Finally, we show how potential regulatory motifs can be discovered solely by their structural preservation, and independent of their sequence conservation and position.

[1]  R. Jansen,et al.  mRNA localization: message on the move , 2001, Nature Reviews Molecular Cell Biology.

[2]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[3]  Kaizhong Zhang Computing similarity between RNA secondary structures , 1998, Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174).

[4]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[5]  R. Gutell,et al.  Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. , 1994, Microbiological reviews.

[6]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[7]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[8]  M. Hentze,et al.  Molecular control of vertebrate iron metabolism: mRNA-based regulatory circuits operated by iron, nitric oxide, and oxidative stress. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Kaizhong Zhang A New Editing based Distance between Unordered Labeled Trees , 1993, CPM.

[10]  Christine Guthrie,et al.  Spliceosomal snRNAs Mg2+-Dependent Chemistry at the Catalytic Core? , 2002, Cell.

[11]  N. Gray,et al.  Control of translation initiation in animals. , 1998, Annual review of cell and developmental biology.

[12]  Kaizhong Zhang,et al.  Identifying Approximately Common Substructures in Trees Based on a Restricted Edit Distance , 1999, Inf. Sci..

[13]  R. Klausner,et al.  Regulating the fate of mRNA: The control of cellular iron metabolism , 1993, Cell.

[14]  Andrzej Lingas,et al.  A Fast Algorithm for Optimal Alignment between Similar Ordered Trees , 2001, CPM.

[15]  Leon D. Segal,et al.  Functions , 1995 .

[16]  Lusheng Wang,et al.  Alignment of trees: an alternative to tree edit , 1995 .

[17]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[18]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[19]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[20]  R. Nussinov,et al.  Tree graphs of RNA secondary structures and their comparisons. , 1989, Computers and biomedical research, an international journal.

[21]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[22]  Kaizhong Zhang,et al.  Identifying consensus of trees through alignment , 2000, Inf. Sci..

[23]  T. Kiss Small Nucleolar RNAs An Abundant Group of Noncoding RNAs with Diverse Cellular Functions , 2002, Cell.

[24]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs , 2000, Nucleic Acids Res..

[25]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[26]  Bruce A. Shapiro,et al.  An algorithm for comparing multiple RNA secondary structures , 1988, Comput. Appl. Biosci..

[27]  J. Guhaniyogi,et al.  Regulation of mRNA stability in mammalian cells. , 2001, Gene.

[28]  Robert Giegerich,et al.  A systematic approach to dynamic programming in bioinformatics , 2000, Bioinform..

[29]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[30]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[31]  Kaizhong Zhang,et al.  An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[33]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[34]  Robert Giegerich,et al.  Algebraic Dynamic Programming , 2002, AMAST.