An efficient index-based protein structure database searching method

In this paper, we present a novel indexing method called ProtDex to facilitate fast searching in 3-dimensional protein structure database. In ProtDex, we first build an index on the representative properties of all proteins in the database. When evaluating a query, with the help of the index, we filter out a small candidate list of proteins. Then, we can either directly report them, with their respective rankings, to the user, or do the expensive actual alignments on them upon user's request. Preliminary experimental results show that our solution is up to 16 times faster than the popular DALI method for database searching task (without actual alignments), while its overall accuracy is only slightly inferior to that of DALI. The software is available upon request by sending emails to the authors.

[1]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[4]  Chris Sander,et al.  3-D Lookup: Fast Protein Structure Database Searches at 90% Reliability , 1995, ISMB.

[5]  Tatsuya Akutsu Protein Structure Alignment Using a Graph Matching Technique , 1995 .

[6]  Arbee L. P. Chen,et al.  Proceedings of the Sixth International Conference on Database Systems for Advanced Applications , 1999 .

[7]  T. Ohkawa,et al.  A method of comparing protein structures based on matrix representation of secondary structure pairwise topology , 1999, Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446).

[8]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[9]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[10]  Ole Lund,et al.  MatrixPlot: visualizing sequence constraints , 1999, Bioinform..

[11]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[12]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[13]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[14]  Andrew J. Martin,et al.  The ups and downs of protein topology; rapid comparison of protein structure. , 2000, Protein engineering.

[15]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[16]  Inge Jonassen,et al.  Protein structure comparison and struc-ture patterns-an algorithmic approach , 2001 .

[17]  C W Hogue Structure databases. , 2001, Methods of biochemical analysis.

[18]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[19]  Hugh E. Williams,et al.  Indexing and Retrieval for Genomic Databases , 2002, IEEE Trans. Knowl. Data Eng..

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[22]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.