Ultrafast shape recognition for similarity search in molecular databases

Molecular databases are routinely screened for compounds that most closely resemble a molecule of known biological activity to provide novel drug leads. It is widely believed that three-dimensional molecular shape is the most discriminating pattern for biological activity as it is directly related to the steep repulsive part of the interaction potential between the drug-like molecule and its macromolecular target. However, efficient comparison of molecular shape is currently a challenge. Here, we show that a new approach based on moments of distance distributions is able to recognize molecular shape at least three orders of magnitude faster than current methodologies. Such an ultrafast method permits the identification of similarly shaped compounds within the largest molecular databases. In addition, the problematic requirement of aligning molecules for comparison is circumvented, as the proposed distributions are independent of molecular orientation. Our methodology could be also adapted to tackle similar hard problems in other fields, such as designing content-based Internet search engines for three-dimensional geometrical objects or performing fast similarity comparisons between proteins. From a broader perspective, we anticipate that ultrafast pattern recognition will soon become not only useful, but also essential to address the data explosion currently experienced in most scientific disciplines.

[1]  W. Charemza,et al.  Conclusions and future prospects , 1989 .

[2]  Irwin D. Kuntz,et al.  A fast and efficient method for 2D and 3D molecular shape description , 1992, J. Comput. Aided Mol. Des..

[3]  Ramaswamy Nilakantan,et al.  New method for rapid characterization of molecular shapes: applications in drug design , 1993, J. Chem. Inf. Comput. Sci..

[4]  Andrew C. Good,et al.  New molecular shape descriptors: Application in database screening , 1995, J. Comput. Aided Mol. Des..

[5]  Mathew Hahn,et al.  Three-Dimensional Shape-Based Searching of Conformationally Flexible Compounds , 1997, J. Chem. Inf. Comput. Sci..

[6]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[7]  W. Graham Richards,et al.  Virtual screening using grid computing: the screensaver project , 2002, Nature Reviews Drug Discovery.

[8]  Takayuki Kotani,et al.  Rapid Evaluation of Molecular Shape Similarity Index Using Pairwise Calculation of the Nearest Atomic Distances , 2002, J. Chem. Inf. Comput. Sci..

[9]  Brian L Claus,et al.  Discovery informatics: its evolving role in drug discovery. , 2002, Drug discovery today.

[10]  Guillermo Moyna,et al.  Shape signatures: a new approach to computer-aided ligand- and receptor-based drug design. , 2003, Journal of medicinal chemistry.

[11]  M. Stahl,et al.  Scaffold hopping. , 2004, Drug discovery today. Technologies.

[12]  J. Jenkins,et al.  A 3D similarity method for scaffold hopping from known drugs or natural ligands to new chemotypes. , 2004, Journal of medicinal chemistry.

[13]  Gerhard Hessler,et al.  Fast similarity searching and screening hit analysis. , 2004, Drug discovery today. Technologies.

[14]  W. Richards,et al.  Evaluation of structural similarity based on reduced dimensionality representations of protein structure. , 2004, Protein engineering, design & selection : PEDS.

[15]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[16]  J. A. Grant,et al.  A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. , 2005, Journal of medicinal chemistry.

[17]  P. Willett Searching techniques for databases of two- and three-dimensional chemical structures. , 2005, Journal of medicinal chemistry.

[18]  Ismail Kola,et al.  Innovation and greater probability of success in drug discovery and development -- from target to biomarkers. , 2005, Current opinion in biotechnology.

[19]  Thomas A. Funkhouser,et al.  Shape-based retrieval and analysis of 3d models , 2005, CACM.

[20]  Peter Willett,et al.  Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules. , 2005 .

[21]  Jonas Boström,et al.  Computational chemistry-driven decision making in lead generation. , 2006, Drug discovery today.

[22]  Jim Gray,et al.  2020 Computing: Science in an exponential world , 2006, Nature.

[23]  A. Szalay Science in an Exponential World , 2008 .