3-D Lookup: Fast Protein Structure Database Searches at 90% Reliability

There are far fewer classes of three-dimensional protein folds than sequence families but the problem of detecting three-dimensional similarities is NP-complete. We present a novel heuristic for identifying 3-D similarities between a query structure and the database of known protein structures. Many methods for structure alignment use a bottom-up approach, identifying first local matches and then solving a combinatorial problem in building up larger clusters of matching substructures. Here, the top-down approach is to start with the global comparison and select a rough superimposition using a fast 3-D lookup of secondary structure motifs. The superimposition is then extended to an alignment of C alpha atoms by an iterative dynamic programming step. An all-against-all comparison of 385 representative proteins (150,000 pair comparisons) took 1 day of computer time on a single R8000 processor. In other words, one query structure is scanned against the database in a matter of minutes. The method is rated at 90% reliability at capturing statistically significant similarities. It is useful as a rapid preprocessor to a comprehensive protein structure database search system.

[1]  D J Thomas The graduation of secondary structure elements. , 1994, Journal of molecular graphics.

[2]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[3]  P Willett,et al.  Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. , 1993, Journal of molecular biology.

[4]  Alex C. W. May,et al.  The comparison of structures and sequences: alignment, searching and the detection of common folds , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[5]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[6]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[7]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[8]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[9]  Chris Sander,et al.  Globin fold in a bacterial toxin , 1993, Nature.

[10]  Arthur M. Lesk,et al.  Protein Architecture: A Practical Approach , 1991 .

[11]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[12]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[13]  H. Wolfson,et al.  An efficient automated computer vision based technique for detection of three dimensional structural motifs in proteins. , 1992, Journal of biomolecular structure & dynamics.

[14]  N. Go,et al.  Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. , 1992, Journal of molecular biology.

[15]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.

[16]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[17]  C. Orengo Classification of protein folds , 1994 .

[18]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[19]  P Willett,et al.  Use of techniques derived from graph theory to compare secondary structure motifs in proteins. , 1990, Journal of molecular biology.

[20]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .