Surface similarity-based molecular query-retrieval

BackgroundDiscerning the similarity between molecules is a challenging problem in drug discovery as well as in molecular biology. The importance of this problem is due to the fact that the biochemical characteristics of a molecule are closely related to its structure. Therefore molecular similarity is a key notion in investigations targeting exploration of molecular structural space, query-retrieval in molecular databases, and structure-activity modelling. Determining molecular similarity is related to the choice of molecular representation. Currently, representations with high descriptive power and physical relevance like 3D surface-based descriptors are available. Information from such representations is both surface-based and volumetric. However, most techniques for determining molecular similarity tend to focus on idealized 2D graph-based descriptors due to the complexity that accompanies reasoning with more elaborate representations.ResultsThis paper addresses the problem of determining similarity when molecules are described using complex surface-based representations. It proposes an intrinsic, spherical representation that systematically maps points on a molecular surface to points on a standard coordinate system (a sphere). Molecular surface properties such as shape, field strengths, and effects due to field super-positioningcan then be captured as distributions on the surface of the sphere. Surface-based molecular similarity is subsequently determined by computing the similarity of the surface-property distributions using a novel formulation of histogram-intersection. The similarity formulation is not only sensitive to the 3D distribution of the surface properties, but is also highly efficient to compute.ConclusionThe proposed method obviates the computationally expensive step of molecular pose-optimisation, can incorporate conformational variations, and facilitates highly efficient determination of similarity by directly comparing molecular surfaces and surface-based properties. Retrieval performance, applications in structure-activity modeling of complex biological properties, and comparisons with existing research and commercial methods demonstrate the validity and effectiveness of the approach.

[1]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[2]  George Karypis,et al.  Frequent substructure-based approaches for classifying chemical compounds , 2003, IEEE Transactions on Knowledge and Data Engineering.

[3]  Rahul Singh Reasoning about molecular similarity and properties , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[4]  Bruce L. Bush,et al.  PATTY: A programmable atom type and language for automatic classification of atoms in molecular databases , 1993, J. Chem. Inf. Comput. Sci..

[5]  G. Kleywegt Use of non-crystallographic symmetry in protein structure refinement. , 1996, Acta crystallographica. Section D, Biological crystallography.

[6]  Gisbert Schneider,et al.  Virtual Screening for Bioactive Molecules: Böhm/Virtual , 2008 .

[7]  Ambuj K. Singh,et al.  Index-based Similarity Search for Protein Structure Databases , 2004, J. Bioinform. Comput. Biol..

[8]  Katsushi Ikeuchi,et al.  A Spherical Representation for Recognition of Free-Form Surfaces , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  R. Cramer,et al.  Prospective identification of biologically active structures by topomer shape similarity searching. , 1999, Journal of medicinal chemistry.

[10]  A. Guttman,et al.  A Dynamic Index Structure for Spatial Searching , 1984, SIGMOD 1984.

[11]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[12]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[13]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[14]  David J. Livingstone,et al.  The Characterization of Chemical Structures Using Molecular Properties. A Survey , 2000, J. Chem. Inf. Comput. Sci..

[15]  H. Wolfson,et al.  Molecular surface recognition by a computer vision-based technique. , 1994, Protein engineering.

[16]  C. Orengo,et al.  A rapid method of protein structure alignment. , 1990, Journal of theoretical biology.

[17]  Gerd Folkers,et al.  3D QSAR in drug design. Vol. 2, Ligand-protein interactions andmolecular similarity , 1998 .

[18]  Robert S. Pearlman,et al.  Metric Validation and the Receptor-Relevant Subspace Concept , 1999, J. Chem. Inf. Comput. Sci..

[19]  Vladimir Cherkassky,et al.  Learning from data , 1998 .

[20]  Paul G. Mezey,et al.  Fundamentals of Molecular Similarity , 2001 .

[21]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[22]  Robert P. Sheridan,et al.  PATTY: A Programmable Atom Typer and Language for Automatic Classification of Atoms in Molecular Databases. , 1994 .

[23]  Peter Willett,et al.  Descriptor‐Based Similarity Measures for Screening Chemical Databases , 2000 .

[24]  L. A. Li︠u︡sternik Convex figures and polyhedra , 1966 .

[25]  M. Randic Characterization of molecular branching , 1975 .

[26]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[27]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[28]  M L Connolly,et al.  The molecular surface package. , 1993, Journal of molecular graphics.

[29]  H. Kubinyi,et al.  3D QSAR in drug design. , 2002 .

[30]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[31]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[32]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[33]  Thomas G. Dietterich,et al.  Compass: A shape-based machine learning tool for drug design , 1994, J. Comput. Aided Mol. Des..

[34]  Stephen R. Johnson,et al.  Molecular properties that influence the oral bioavailability of drug candidates. , 2002, Journal of medicinal chemistry.

[35]  A. N. Jain,et al.  Compass: predicting biological activities from molecular surface properties. Performance comparisons on a steroid benchmark. , 1994, Journal of medicinal chemistry.

[36]  K. Kinoshita,et al.  Identification of protein biochemical functions by similarity search using the molecular surface database eF‐site , 2003, Protein science : a publication of the Protein Society.

[37]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[38]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[39]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[40]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[41]  A. N. Jain,et al.  Molecular hashkeys: a novel method for molecular characterization and its application for predicting important pharmaceutical properties of molecules. , 1999, Journal of medicinal chemistry.

[42]  P. Labute,et al.  Flexible alignment of small molecules. , 2001, Journal of medicinal chemistry.

[43]  Irwin D. Kuntz,et al.  A fast and efficient method for 2D and 3D molecular shape description , 1992, J. Comput. Aided Mol. Des..

[44]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[45]  Pierre Baldi,et al.  ChemDB: a public database of small molecules and related chemoinformatics resources , 2005, Bioinform..

[46]  Wolfgang Guba,et al.  Molecular Field-Derived Descriptors for the Multivariate Modeling of Pharmacokinetic Data , 2000 .

[47]  B D Silverman,et al.  Comparative molecular moment analysis (CoMMA): 3D-QSAR without molecular superposition. , 1996, Journal of medicinal chemistry.