An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance.

We have devised and implemented in PrISM (protein informatics system for modeling) a new measure of protein structural relationships, the protein structural distance (PSD). The PSD is designed to describe relationships between protein structures in quantitative rather than descriptive terms and is applicable both when two structures are very similar, and when they are very different. It is calculated with a structural alignment procedure that uses double dynamic programming to align secondary structure elements and an iterative rigid body superposition that minimizes the root-mean-square deviation of C(alpha) atoms. The alignment algorithm, as implemented on a modest workstation, is computationally efficient, allowing for large-scale structural comparisons. PSD scores for more than one and a half million pairs of proteins were calculated and compared to the discrete classification of proteins in the SCOP database. The PSD scores, which were obtained automatically, are in large part consistent with the manually derived classifications in SCOP. Discrepancies do arise, however, due, in part, to the fact that SCOP uses criteria other than structural similarity to derive classifications while the PrISM procedure is exclusively structure based. Analysis of PSD scores suggests that there is a continuous aspect of protein conformation space, even though various classification schemes are extremely useful. The use of a continuous measure for structural distance between all pairs of proteins allows us, as described in the two accompanying papers to derive sequence/structure relationships in a more quantitative way than has previously been possible. An important strength of the approach implemented in PrISM is its ability to address many different kinds of queries interactively, making its structural comparison procedure a convenient computational tool that complements structural classification databases such as SCOP and CATH.

[1]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[2]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[3]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  K Mizuguchi,et al.  Seeking significance in three-dimensional protein structure comparisons. , 1995, Current opinion in structural biology.

[5]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[6]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[7]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[8]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[9]  P E Bourne,et al.  An alternative view of protein fold space , 2000, Proteins.

[10]  C. Sander,et al.  Searching protein structure databases has come of age , 1994, Proteins.

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[13]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[14]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[15]  C. Orengo Classification of protein folds , 1994 .

[16]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[17]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[18]  James H. Hurley,et al.  Similarity between C2 domain jaws and immunoglobulin CDRs , 1997, Nature Structural Biology.

[19]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[20]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[21]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[22]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[23]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[24]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[25]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. , 2000, Journal of molecular biology.

[26]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[27]  B. Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. , 2000, Journal of molecular biology.

[28]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.