Guest Editorial: WABI Special Section Part 1

THE Fourth International Workshop on Algorithms in BIoinformatics (WABI) 2004 was held in Bergen, Norway, September 2004. The program committee consisted of 33 members and selected, among 117 submissions, 39 to be presented at the workshop and included in the proceedings from the workshop (volume 3240 of Lecture Notes in Bioinformatics, series edited by Sorin Istrail, Pavel Pevzner, and Michael Waterman). The WABI 2004 program committee selected a small number of papers among the 39 to be invited to submit extended versions of their papers to a special section of the IEEE/ACM Transactions on Computational Biology and Bioinformatics. Four papers were published in the OctoberDecember 2004 issue of the journal and this issue contains an additional three papers. We would like to thank both the entire program committee for WABI and the reviewers of the papers in this issue for their valuable contributions. The first of the papers is “A New Distance for High Level RNA Secondary Structure Comparison” authored by Julien Allali and Marie-France Sagot. This paper describes algorithms for comparing secondary structuresofRNAmolecules where the structures are represented by trees. The problemof classifying RNA secondary structure is becoming critical as biologists are discovering more and more noncoding functional elements in the genome (e.g., miRNA). Most likely, the major functional determinants of the elements are their secondary structure and, therefore, a metric between such secondary structures will also help delineate clusters of functional groups. In Allali and Sagot’s paper, two tree representations of secondary structure are compared by analysing how one tree can be transformed into the other using an allowed set of operations. Each operation can be associatedwith a cost and the distance between two trees can then be defined as the minimum cost associated with a transform of one tree to the other. Allali and Sagot introduce two new operations that they name edge fusion and node fusion and show that these alleviate limitations associated with the classical tree edit operations used for RNA comparison. Importantly, they also present algorithms for calculating the distance between trees allowing the new operations in addition to the classical ones, and analyze the performance of the algorithms. The second paper is “Topological Rearrangements and Local Search Method for Tandem Duplication Trees” and is authored by Denis Bertrand and Olivier Gascuel. The paper approaches the problem of estimating the evolutionary history of tandem repeats. A tandem repeat is a stretch of DNA sequence that contains an element that is repeated multiple times and where the repeat occurrences are next to each other in the sequence. Since the repeats are subject to mutations, they are not identical. Therefore, tandem repeats occur through evolution by “copying” (duplication) of repeat elements in blocks of varying size. Bertrand and Gascuel address the problem of finding the most likely sequence of events giving rise to the observed set of repeats. Each sequence of events can be described by a duplication tree and one searches for the tree that is the most parsimonious, i.e., one that explains how the sequence has evolved from an ancestral single copy with a minimum number of mutations along the branches of the tree. The main difference with the standard phylogeny problem is that linear ordering of the tandem duplications impose constraints the possible binary tree form. This paper describes a local search method that allows exploration of the complete space of possible duplication trees and shows that the method is superior to other existing methods for reconstructing the tree and recovering its duplication events. The third paper is “Optimizing Multiple Seeds for Homology Search” authored by Daniel G. Brown. The paper presents an approach to selecting starting points for pairwise local alignments of protein sequences. The problem of pairwise local alignment is to find a segment from each so that the two local segments can be aligned to obtain a high score. For commonly used scoring schemes, this can be solved exactly using dynamic programming. However, pairwise alignment is frequently applied to large data sets and heuristic methods for restricting alignments to be considered are frequently used, for instance, in the BLAST programs. The key is to restrict the number of alignments as much as possible, by choosing a few good seeds, without missing high scoring alignments. The paper shows that this can be formulated as an integer programming problem and presents algorithm for choosing optimal seeds. Analysis is presented showing that the approach gives four times fewer false positives (unnecessary seeds) in comparison with BLASTP without losing more good hits.