Rapid 3D protein structure database searching using information retrieval techniques

MOTIVATION As the sizes of three-dimensional (3D) protein structure databases are growing rapidly nowadays, exhaustive database searching, in which a 3D query structure is compared to each and every structure in the database, becomes inefficient. We propose a rapid 3D protein structure retrieval system named 'ProtDex2', in which we adopt the techniques used in information retrieval systems in order to perform rapid database searching without having access to every 3D structure in the database. The retrieval process is based on the inverted-file index constructed on the feature vectors of the relationships between the secondary structure elements (SSEs) of all the 3D protein structures in the database. ProtDex2 is a significant improvement, both in terms of speed and accuracy, upon its predecessor system, ProtDex. RESULTS The experimental results show that ProtDex2 is very much faster than two well-known protein structure comparison methods, DALI and CE, yet not sacrificing on the accuracy of the comparison. When comparing with a similar SSE-based method, namely TopScan, ProtDex2 is much faster with comparable degree of accuracy. AVAILABILITY The software is available at: http://xena1.ddns.comp.nus.edu.sg/~genesis/PD2.htm

[1]  Kian-Lee Tan,et al.  An efficient index-based protein structure database searching method , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[2]  Kian-Lee Tan,et al.  Augmenting SSEs with structural properties for rapid protein structure comparison , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[3]  Stefano Lonardi,et al.  Analysis of secondary structure elements of proteins using indexing techniques , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[4]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[5]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[6]  Thierry Pun,et al.  Efficient access methods for content-based image retrieval with inverted files , 1999, Optics East.

[7]  Ambuj K. Singh,et al.  PSI: indexing protein structures for fast similarity search , 2003, ISMB.

[8]  Andrew J. Martin,et al.  The ups and downs of protein topology; rapid comparison of protein structure. , 2000, Protein engineering.

[9]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[10]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[11]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[12]  Hugh E. Williams,et al.  Indexing and Retrieval for Genomic Databases , 2002, IEEE Trans. Knowl. Data Eng..

[13]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[14]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[15]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[16]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..