论文信息 - Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions

Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions

We study the problem of computing optimal spaced seeds for detecting sequences generated by a Hidden Markov model. Inspired by recent work in DNA sequence alignment, we have developed such a model for representing the conservation between related DNA coding sequences. Our model includes positional dependencies and periodic rates of conservation, as well as regional deviations in overall conservation rate. We show that, for hidden Markov models in general, the probability that a seed is matched in a region can be computed efficiently, and use these methods to compute the optimal seed for our models. Our experiments on real data show that the optimal seeds are substantially more sensitive than the seeds used in the standard alignment program BLAST, and also substantially better than those of PatternHunter or WABA, both of which use spaced seeds. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.

Daniel G. Brown | Tomás Vinar | Brona Brejová

[1] Z. Yang,et al. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[2] W. J. Kent,et al. Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. , 2000, Genome research.

[3] Durbin,et al. Biological Sequence Analysis , 1998 .

[4] Bin Ma,et al. PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[5] Ian Korf,et al. Integrating genomic homology into gene structure prediction , 2001, ISMB.

[6] Jeremy Buhler,et al. Designing seeds for similarity search in genomic DNA , 2003, RECOMB '03.

[7] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[8] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.