Gypsum-DL: an open-source program for preparing small-molecule libraries for structure-based virtual screening

Computational techniques such as structure-based virtual screening require carefully prepared 3D models of potential small-molecule ligands. Though powerful, existing commercial programs for virtual-library preparation have restrictive and/or expensive licenses. Freely available alternatives, though often effective, do not fully account for all possible ionization, tautomeric, and ring-conformational variants. We here present Gypsum-DL, a free, robust open-source program that addresses these challenges. As input, Gypsum-DL accepts virtual compound libraries in SMILES or flat SDF formats. For each molecule in the virtual library, it enumerates appropriate ionization, tautomeric, chiral, cis/trans isomeric, and ring-conformational forms. As output, Gypsum-DL produces an SDF file containing each molecular form, with 3D coordinates assigned. To demonstrate its utility, we processed 1558 molecules taken from the NCI Diversity Set VI and 56,608 molecules taken from a Distributed Drug Discovery (D3) combinatorial virtual library. We also used 4463 high-quality protein–ligand complexes from the PDBBind database to show that Gypsum-DL processing can improve virtual-screening pose prediction. Gypsum-DL is available free of charge under the terms of the Apache License, Version 2.0.

[1]  Mario A. Storti,et al.  MPI for Python: Performance improvements and MPI-2 extensions , 2008, J. Parallel Distributed Comput..

[2]  Jacob D Durrant,et al.  BlendMol: advanced macromolecular visualization in Blender , 2018, Bioinform..

[3]  Alexander D. MacKerell,et al.  Computational ligand-based rational design: Role of conformational sampling and force fields in model development. , 2011, MedChemComm.

[4]  Irene T Weber,et al.  Design and Development of Highly Potent HIV-1 Protease Inhibitors with a Crown-Like Oxotricyclic Core as the P2-Ligand To Combat Multidrug-Resistant HIV Variants. , 2017, Journal of medicinal chemistry.

[5]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[6]  Björn Krüger,et al.  The holistic integration of virtual screening in drug discovery. , 2013, Drug discovery today.

[7]  Pierre Tufféry,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[8]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[9]  Jacques Chomilier,et al.  RPBS: a web resource for structural bioinformatics , 2005, Nucleic Acids Res..

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  Jacob D. Durrant,et al.  Distributed Drug Discovery: Advancing Chemical Education through Contextualized Combinatorial Solid-Phase Organic Laboratories , 2015 .

[12]  Mario A. Storti,et al.  MPI for Python , 2005, J. Parallel Distributed Comput..

[13]  Maria A Miteva,et al.  DG-AMMOS: A New tool to generate 3D conformation of small molecules using Distance Geometry and Automated Molecular Mechanics Optimization for in silico Screening , 2009, BMC chemical biology.

[14]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[15]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[16]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[17]  Gurvinder Gill,et al.  Conformational study of cis-1,4-di-tert-butylcyclohexane by dynamic NMR spectroscopy and computational methods. Observation of chair and twist-boat conformations. , 2005, The Journal of organic chemistry.

[18]  Jacob D. Durrant,et al.  Scoria: a Python module for manipulating 3D molecular data , 2017, Journal of Cheminformatics.

[19]  Bruce Tidor,et al.  Optimal charges in lead progression: a structure-based neuraminidase case study. , 2006, Journal of medicinal chemistry.

[20]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[21]  Jacob D Durrant,et al.  Documenting and harnessing the biological potential of molecules in Distributed Drug Discovery (D3) virtual catalogs , 2017, Chemical biology & drug design.

[22]  Mark S. Johnson,et al.  Accurate conformation‐dependent molecular electrostatic potentials for high‐throughput in silico drug discovery , 2009, J. Comput. Chem..

[23]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[24]  Pierre Tufféry,et al.  Frog2: Efficient 3D conformation ensemble generator for small compounds , 2010, Nucleic Acids Res..

[25]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[26]  J Andrew McCammon,et al.  BINANA: a novel algorithm for ligand-binding characterization. , 2011, Journal of molecular graphics & modelling.

[27]  William L. Scott,et al.  Distributed Drug Discovery, Part 1: Linking Academia and Combinatorial Chemistry to Find Drug Leads for Developing World Diseases , 2008, Journal of combinatorial chemistry.

[28]  E. Lionta,et al.  Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances , 2014, Current topics in medicinal chemistry.

[29]  Lisandro Dalcin,et al.  Parallel distributed computing using Python , 2011 .

[30]  William L. Scott,et al.  Distributed Drug Discovery, Part 3: Using D3 Methodology to Synthesize Analogs of an Anti-Melanoma Compound , 2008, Journal of combinatorial chemistry.

[31]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[32]  C. Dann,et al.  Structures of human folate receptors reveal biological trafficking states and diversity in folate and antifolate recognition , 2013, Proceedings of the National Academy of Sciences.

[33]  Mark S. Johnson,et al.  Generating Conformer Ensembles Using a Multiobjective Genetic Algorithm , 2007, J. Chem. Inf. Model..

[34]  Jacob D. Durrant,et al.  Dimorphite-DL: an open-source program for enumerating the ionization states of drug-like small molecules , 2019, Journal of Cheminformatics.

[35]  Benjamin A. Ellingson,et al.  Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database , 2010, J. Chem. Inf. Model..

[36]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[37]  Anthony Nicholls,et al.  Conformer Generation with OMEGA: Learning from the Data Set and the Analysis of Failures , 2012, J. Chem. Inf. Model..

[38]  Debanjan Sen,et al.  Pharmacophore modeling and 3D quantitative structure-activity relationship analysis of febrifugine analogues as potent antimalarial agent , 2013, Journal of advanced pharmaceutical technology & research.

[39]  Anthony C. Willis,et al.  Amide-Iminol Tautomerism: Effect of Metalation , 1994 .

[40]  Dariusz Matosiuk,et al.  Distributed Drug Discovery, Part 2: Global Rehearsal of Alkylating Agents for the Synthesis of Resin-Bound Unnatural Amino Acids and Virtual D3 Catalog Construction , 2008, Journal of combinatorial chemistry.