Forest alignment with affine gaps and anchors, applied in RNA structure comparison

We present two enhancements to Jiang's tree alignment algorithm, motivated by experience with its use for RNA structure alignment. One enhancement is the introduction of an affine gap model, which can be accommodated with a runtime increase by a constant factor. The second enhancement is a speed-up of the alignment algorithm when certain nodes in the trees are pre-aligned by a so-called anchoring. Both enhancements are included in a new implementation of the tool RNAforester. We evaluate the new algorithm with two applications related to RNA secondary structure analysis. Based on our experience, we suggest a new formulation of the tree alignment model, based on regular tree languages and rewrite rules.

[1]  Robert Giegerich,et al.  Pure multiple RNA secondary structure alignments: a progressive profile approach , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Lusheng Wang,et al.  Alignment of trees: an alternative to tree edit , 1995 .

[3]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[4]  Ron Y. Pinter,et al.  Seeded Tree Alignment , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[5]  Gad M. Landau,et al.  Fast RNA Structure Alignment for Crossing Input Structures , 2009, CPM.

[6]  Francesc Rosselló,et al.  An algebraic view of the relation between largest common subtrees and smallest common supertrees , 2006, Theor. Comput. Sci..

[7]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[8]  Robert Giegerich,et al.  Abstract shapes of RNA. , 2004, Nucleic acids research.

[9]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[10]  Hélène Touzet,et al.  How to Compare Arc-Annotated Sequences: The Alignment Hierarchy , 2006, SPIRE.

[11]  Robert Giegerich,et al.  Fine-tuning structural RNA alignments in the twilight zone , 2010, BMC Bioinformatics.

[12]  Hélène Touzet,et al.  A Linear Tree Edit Distance Algorithm for Similar Ordered Trees , 2005, CPM.

[13]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[14]  M. Höchsmann,et al.  The tree alignment model : algorithms, implementations and applications for the analysis of RNA secondary structures , 2005 .

[15]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[16]  Robert Giegerich,et al.  A discipline of dynamic programming over sequence data , 2004, Sci. Comput. Program..

[17]  William Ritchie,et al.  RNA stem-loops: to be or not to be cleaved by RNAse III. , 2007, RNA.

[18]  Stephen H. Unger A global parser for context-free phrase structure grammars , 1968, CACM.

[19]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[20]  Stefanie Schirmer Comparing forests , 2012 .

[21]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[22]  Robert Giegerich,et al.  Semantics and Ambiguity of Stochastic RNA Family Models , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[24]  Rolf Backofen,et al.  Fixed Parameter Tractable Alignment of RNA Structures Including Arbitrary Pseudoknots , 2008, CPM.

[25]  Robert Giegerich,et al.  Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction , 2005, Bioinform..

[26]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[27]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[28]  Enno Ohlebusch,et al.  Multiple Genome Alignment: Chaining Algorithms Revisited , 2003, CPM.

[29]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[30]  Robert Giegerich,et al.  Bellman's GAP: a declarative language for dynamic programming , 2011, PPDP.

[31]  Hélène Touzet,et al.  Tree edit distance with gaps , 2003, Inf. Process. Lett..