We consider the problem of obtaining the maximum a posteriori probability (MAP) estimate of a consensus ancestral sequence for a set of DNA sequences. Our maximization method, called ASA (dnA Sequence Alignment), can be applied to the refinement of noisy regions of a DNA assembly, to the alignment of genomic functional sites, or to the alignment of any set of DNA sequences related by a star-like phylogeny. Along with the optimal consensus, ASA finds suboptimal solutions together with their relative probabilities. The probabilistic approach makes it possible to establish the limits to which an ancestor can in principle be recovered from diverged sequences. In simulations on rather short synthetic sequences (of length up to 80) with different coverage and error rates ranging from 5% to 30%, ASA restored the consensus from noisy observations essentially as best as is theoretically possible for the given error rates. We also illustrate the performance of ASA on the alignment of E.Coli promoters and the Alu-Sb subfamily of human repeat sequences. Since our model is a special case of a profile HMM, we give a comparison between these two approaches, as well as with other DNA alignment methods.
[1]
R. Quatrano.
Genomics
,
1998,
Plant Cell.
[2]
K. Pearson,et al.
Biometrika
,
1902,
The American Naturalist.
[3]
Anil K. Kesarwani,et al.
Genome Informatics
,
2019,
Encyclopedia of Bioinformatics and Computational Biology.
[4]
A. Dunker.
The pacific symposium on biocomputing
,
1998
.
[5]
Aleksandar D Milosavljevic.
CATEGORIZATION OF MACROMOLECULAR SEQUENCES BY MINIMAL LENGTH ENCODING (Ph.D. Thesis)
,
1990
.
[6]
David Sankoff,et al.
Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
,
1983
.