Counting, Generating and Sampling Tree Alignments

Pairwise ordered tree alignment are combinatorial objects that appear in RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e. two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis. In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by means of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees. By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal RNA secondary structures alignments.

[1]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[2]  Jens Stoye,et al.  THE NUMBER OF STANDARD AND OF EFFECTIVE MULTIPLE ALIGNMENTS , 1998 .

[3]  Marcel H. Schulz,et al.  Research in Computational Molecular Biology , 2018, Lecture Notes in Computer Science.

[4]  P. Flajolet,et al.  Analytic Combinatorics: RANDOM STRUCTURES , 2009 .

[5]  Alain Denise,et al.  Alignments of RNA Structures , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Robert Giegerich,et al.  Forest alignment with affine gaps and anchors, applied in RNA structure comparison , 2013, Theor. Comput. Sci..

[7]  Yann Ponty,et al.  A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms , 2011, WABI.

[8]  H. Wilf A unified setting for sequencing, ranking, and selection algorithms for combinatorial objects , 1977 .

[9]  J. Nieto,et al.  An Exact Formula for the Number of Alignments Between Two DNA Sequences , 2003, DNA sequence : the journal of DNA sequencing and mapping.

[10]  Serafim Batzoglou,et al.  CONTRAlign: Discriminative Training for Protein Sequence Alignment , 2006, RECOMB.

[11]  P. Argos,et al.  Determination of reliable regions in protein sequence alignments. , 1990, Protein engineering.

[12]  Alain Denise,et al.  Average complexity of the Jiang-Wang-Zhang pairwise tree alignment algorithm and of a RNA secondary structure alignment algorithm , 2010, Theor. Comput. Sci..

[13]  Juan J. Nieto,et al.  The number of reduced alignments between two DNA sequences , 2014, BMC Bioinformatics.

[14]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  Philippe Flajolet,et al.  Analytic Combinatorics , 2009 .

[16]  Susan R. Wilson INTRODUCTION TO COMPUTATIONAL BIOLOGY: MAPS, SEQUENCES AND GENOMES. , 1996 .

[17]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[18]  Alain Denise,et al.  Optimisation Problems for Pairwise RNA Sequence and Structure Comparison: A Brief Survey , 2013, Trans. Comput. Collect. Intell..

[19]  Bing Liu,et al.  Web data extraction based on partial tree alignment , 2005, WWW '05.

[20]  Robert Giegerich,et al.  Pure multiple RNA secondary structure alignments: a progressive profile approach , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..