论文信息 - MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information

MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information

We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10 000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.

N. Grishin | J. Pei | Jimin Pei

[1] M. O. Dayhoff,et al. Atlas of protein sequence and structure , 1965 .

[2] M. O. Dayhoff,et al. 22 A Model of Evolutionary Change in Proteins , 1978 .

[3] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4] W. Kabsch,et al. Identical pentapeptides with different backbones , 1985, Nature.

[5] J M Thornton,et al. Molecular recognition. Conformational analysis of limited proteolytic sites and serine proteinase protein inhibitors. , 1991, Journal of molecular biology.

[6] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[7] Ronald Breslow,et al. Molecular recognition , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8] J. Thompson,et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[9] S. Miyazawa. A reliable sequence alignment method based on probabilities of residue correspondences. , 1995, Protein engineering.

[10] A G Murzin,et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[11] S. Wodak,et al. Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins. , 1995, Protein engineering.

[12] C Sander,et al. Mapping the Protein Universe , 1996, Science.

[13] S. Eddy. Hidden Markov models. , 1996, Current opinion in structural biology.

[14] F. Cohen,et al. An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[15] O. Gotoh. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[16] Gapped BLAST and PSI-BLAST: A new , 1997 .

[17] P E Bourne,et al. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[18] Durbin,et al. Biological Sequence Analysis , 1998 .