论文信息 - Structure-dependent sequence alignment for remotely related proteins

Structure-dependent sequence alignment for remotely related proteins

MOTIVATION The quality of a model structure derived from a comparative modeling procedure is dictated by the accuracy of the predicted sequence-template alignment. As the sequence-template pairs are increasingly remote in sequence relationship, the prediction of the sequence-template alignments becomes increasingly problematic with sequence alignment methods. Structural information of the template, used in connection with the sequence relationship of the sequence-template pair, could significantly improve the accuracy of the sequence-template alignment. In this paper, we describe a sequence-template alignment method that integrates sequence and structural information to enhance the accuracy of sequence-template alignments for distantly related protein pairs. RESULTS The structure-dependent sequence alignment (SDSA) procedure was optimized for coverage and accuracy on a training set of 412 protein pairs; the structures for each of the training pairs are similar (RMSD< approximately 4A) but the sequence relationship is undetectable (average pair-wise sequence identity = 8%). The optimized SDSA procedure was then applied to extend PSI-BLAST local alignments by calculating the global alignments under the constraint of the residue pairs in the local alignments. This composite alignment procedure was assessed with a testing set of 1421 protein pairs, of which the pair-wise structures are similar (RMSD< approximately 4A) but the sequences are marginally related at best in each pair (average pair-wise sequence identity = 13%). The assessment showed that the composite alignment procedure predicted more aligned residues pairs with an average of 27% increase in correctly aligned residues over the standard PSI-BLAST alignments for the protein pairs in the testing set.

An-Suei Yang | A. Yang

[1] A. Sali,et al. Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[2] G J Barton,et al. Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.

[3] M J Sippl,et al. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[4] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[5] B Honig,et al. An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[6] J. M. Sauder,et al. Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[7] T. Smith,et al. Alignment of protein sequences using secondary structure: a modified dynamic programming method. , 1990, Protein engineering.

[8] S F Altschul,et al. Local alignment statistics. , 1996, Methods in enzymology.

[9] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10] E. Lindahl,et al. Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[11] B. Honig,et al. An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. , 2000, Journal of molecular biology.

[12] R L Jernigan,et al. Identifying sequence-structure pairs undetected by sequence alignments. , 2000, Protein engineering.

[13] M Levitt,et al. Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. , 1986, Protein engineering.

[14] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15] B Honig,et al. An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. , 2000, Journal of molecular biology.

[16] U. Hobohm,et al. Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[17] Christus,et al. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[18] A G Murzin,et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[19] David C. Jones,et al. GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[20] Alignment of protein sequences using the hydrophobic core scores. , 1989, Protein engineering.

[21] Leszek Rychlewski,et al. Improving the quality of twilight‐zone alignments , 2000, Protein science : a publication of the Protein Society.

[22] C. Chothia,et al. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23] J. Wootton,et al. Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.