Accelerating screening of 3D protein data with a graph theoretical approach

MOTIVATION The Dictionary of Interfaces in Proteins (DIP) is a database collecting the 3D structure of interacting parts of proteins that are called patches. It serves as a repository, in which patches similar to given query patches can be found. The computation of the similarity of two patches is time consuming and traversing the entire DIP requires some hours. In this work we address the question of how the patches similar to a given query can be identified by scanning only a small part of DIP. The answer to this question requires the investigation of the distribution of the similarity of patches. RESULTS The score values describing the similarity of two patches can roughly be divided into three ranges that correspond to different levels of spatial similarity. Interestingly, the two iso-score lines separating the three classes can be determined by two different approaches. Applying a concept of the theory of random graphs reveals significant structural properties of the data in DIP. These can be used to accelerate scanning the DIP for patches similar to a given query. Searches for very similar patches could be accelerated by a factor of more than 25. Patches with a medium similarity could be found 10 times faster than by brute-force search.

[1]  Martin Stahl,et al.  Modifications of the scoring function in FlexX for virtual screening applications , 2000 .

[2]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[3]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[4]  A Good,et al.  Structure-based virtual screening protocols. , 2001, Current opinion in drug discovery & development.

[5]  Joachim M. Buhmann,et al.  A theory of proximity based clustering: structure detection by optimization , 2000, Pattern Recognit..

[6]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Gabriele Cruciani,et al.  QSAR study and VolSurf characterization of anti-HIV quinolone library , 2001, J. Comput. Aided Mol. Des..

[8]  D Horvath,et al.  From hit to lead. Analyzing structure-profile relationships. , 2001, Journal of medicinal chemistry.

[9]  F. Allen,et al.  The crystallographic information file (CIF) : a new standard archive file for crystallography , 1991 .

[10]  Jürgen Bajorath,et al.  Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual Screening , 2001, J. Chem. Inf. Comput. Sci..

[11]  I D Kuntz,et al.  Development and screening of a polyketide virtual library for drug leads against a motilide pharmacophore. , 2000, Journal of molecular graphics & modelling.

[12]  G. Batist,et al.  Identification of a novel steroid derivative, NSC12983, as a paclitaxel-like tubulin assembly promoter by 3-D virtual screening. , 2001, Anti-cancer drug design.

[13]  D C Spellmeyer,et al.  Applications of random sampling to virtual screening of combinatorial libraries. , 2000, Journal of molecular graphics & modelling.

[14]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Alexander Schliep,et al.  Clustering Protein Sequences ? Structure Prediction by Transitive Homology , 2001, German Conference on Bioinformatics.

[16]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[17]  D. Matula k-Components, Clusters and Slicings in Graphs , 1972 .

[18]  S H Kaufmann,et al.  Successful virtual screening of a chemical database for farnesyltransferase inhibitor leads. , 2000, Journal of medicinal chemistry.

[19]  D Horvath,et al.  From hit to lead. Combining two complementary methods for focused library design. Application to mu opiate ligands. , 2001, Journal of medicinal chemistry.

[20]  K. Fukasawa,et al.  Structure-based generation of a new class of potent Cdk4 inhibitors: new de novo design strategy and library design. , 2001, Journal of medicinal chemistry.

[21]  L Xue,et al.  Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. , 2000, Combinatorial chemistry & high throughput screening.

[22]  P. Dean,et al.  Recent advances in structure-based rational drug design. , 2000, Current opinion in structural biology.

[23]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[24]  Robert Preissner,et al.  Dictionary of Interfaces in Proteins (DIP). Data Bank of complementary molecular surface patches , 1998, German Conference on Bioinformatics.

[25]  Béla Bollobás,et al.  Random Graphs , 1985 .

[26]  Kristian Rother,et al.  Matching organic libraries with protein-substructures , 2001, J. Comput. Aided Mol. Des..