Towards Scaleable Protein Structure Comparison and Database Search

Comparing protein structures in three dimensions is a computationally expensive process that makes a full scan of a protein against a library of known protein structures impractical. To reduce the cost, we can use an approximation of the three dimensional structure that allows protein comparison to be performed quickly to filter away dissimilar proteins. In this paper, we present a new algorithm, called SCALE, for protein structure comparison. In SCALE, a protein is represented as a sequence of secondary structure elements (SSEs) augmented with 3D structural properties such as the distances and angles between the SSEs. As such, the comparison between two proteins is reduced to a sequence alignment problem between their corresponding sequences of SSEs. The 3-D structural properties of the proteins contribute to the similarity score between the two sequences. We have implemented SCALE, and compared its performance against existing schemes. Our performance study shows that SCALE outperforms existing methods in terms of both efficiency and effectiveness (measured in terms of precision and recall). To avoid exhaustive search, an index based on the structural properties is also proposed. The index prunes away a considerable amount of dissimilar proteins given a query protein.

[1]  C. Chothia,et al.  Helix to helix packing in proteins. , 1981, Journal of molecular biology.

[2]  Louis Rosenfeld 5 – Classification of Proteins , 1982 .

[3]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[4]  H. Chandler Database , 1985 .

[5]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[6]  W. Saenger,et al.  Crystallization of the DNA-binding Escherichia coli protein FIS. , 1989, Journal of molecular biology.

[7]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[10]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[11]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[12]  Chris Sander,et al.  3-D Lookup: Fast Protein Structure Database Searches at 90% Reliability , 1995, ISMB.

[13]  A G Murzin,et al.  Structural classification of proteins: new superfamilies. , 1996, Current opinion in structural biology.

[14]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[15]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[16]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[17]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[18]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[19]  D Walther,et al.  WebMol--a Java-based PDB viewer. , 1997, Trends in biochemical sciences.

[20]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[21]  D T Jones,et al.  Classifying a protein in the CATH database of domain structures. , 1998, Acta crystallographica. Section D, Biological crystallography.

[22]  Janet M. Thornton,et al.  Classifying a Protein Fold in the CATH Hierarchic Database , 1998 .

[23]  T. Ohkawa,et al.  A method of comparing protein structures based on matrix representation of secondary structure pairwise topology , 1999, Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446).

[24]  Hans-Peter Kriegel,et al.  Nearest Neighbor Classification in 3D Protein Databases , 1999, ISMB.

[25]  Guoguang Lu,et al.  TOP: a new method for protein structure comparisons and similarity searches , 2000 .

[26]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[27]  Amit Singh,et al.  Protein Structure Alignment: A Comparison of Methods , 2000 .

[28]  Andrew J. Martin,et al.  The ups and downs of protein topology; rapid comparison of protein structure. , 2000, Protein engineering.

[29]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[30]  Inge Jonassen,et al.  Protein structure comparison and struc-ture patterns-an algorithmic approach , 2001 .

[31]  Kian-Lee Tan,et al.  Augmenting SSEs with structural properties for rapid protein structure comparison , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[32]  Ambuj K. Singh,et al.  Towards index-based similarity search for protein structure databases , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[33]  Kian-Lee Tan,et al.  Rapid 3D protein structure database searching using information retrieval techniques , 2004, Bioinform..

[34]  Chi-Ren Shyu,et al.  A Fast Protein Structure Retrieval System Using Image-Based Distance Matrices and Multidimensional Index , 2004, BIBE.

[35]  K. Hofmann Classification of proteins by sequence signatures , 2004 .

[36]  Zi Huang,et al.  High dimensional indexing for protein structure matching using bowties , 2005, APBC.