A Probabilistic Approach to a Consensus Multiple Alignment

We consider the problem of obtaining the maximum a posteriori probability (MAP) estimate of a consensus ancestral sequence for a set of DNA sequences. Our maximization method, called ASA (dnA Sequence Alignment), can be applied to the refinement of noisy regions of a DNA assembly, to the alignment of genomic functional sites, or to the alignment of any set of DNA sequences related by a star-like phylogeny. Along with the optimal consensus, ASA finds suboptimal solutions together with their relative probabilities. The probabilistic approach makes it possible to establish the limits to which an ancestor can in principle be recovered from diverged sequences. In simulations on rather short synthetic sequences (of length up to 80) with different coverage and error rates ranging from 5% to 30%, ASA restored the consensus from noisy observations essentially as best as is theoretically possible for the given error rates. We also illustrate the performance of ASA on the alignment of E.Coli promoters and the Alu-Sb subfamily of human repeat sequences. Since our model is a special case of a profile HMM, we give a comparison between these two approaches, as well as with other DNA alignment methods.