FSees: Customized Enumeration of Chemical Subspaces with Limited Main Memory Consumption

In the search for new marketable drugs, new ideas are required constantly. Particularly with regard to challenging targets and previously patented chemical space, designing novel molecules is crucial. This demands efficient and innovative computational tools to generate libraries of promising molecules. Here we present an efficient method to generate such libraries by systematically enumerating all molecules in a specific chemical space. This space is defined by a fragment space and a set of user-defined physicochemical properties (e.g., molecular weight, tPSA, number of H-bond donors and acceptors, or predicted logP). In order to enumerate a very large number of molecules, our algorithm uses file-based data structures instead of memory-based ones, thus overcoming the limitations of computer main memory. The resulting chemical library can be used as a starting point for computational lead-finding technologies, like similarity searching, pharmacophore mapping, docking, or virtual screening. We applied the algorithm in different scenarios, thus creating numerous target-specific libraries. Furthermore, we generated a fragment space from all approved drugs in DrugBank and enumerated it with lead-like constraints, thus generating 0.5 billion molecules in the molecular weight range 250-350.

[1]  Woody Sherman,et al.  Computational approaches for fragment-based and de novo design. , 2010, Current topics in medicinal chemistry.

[2]  J. Reymond The chemical space project. , 2015, Accounts of chemical research.

[3]  Thomas Bäck,et al.  The Molecule Evoluator. An Interactive Evolutionary Algorithm for the Design of Drug-Like Molecules , 2006, J. Chem. Inf. Model..

[4]  Christian Lemmen,et al.  Similarity searching and scaffold hopping in synthetically accessible combinatorial chemistry spaces. , 2008, Journal of medicinal chemistry.

[5]  Markus Hartenfeller,et al.  DOGS: Reaction-Driven de novo Design of Bioactive Compounds , 2012, PLoS Comput. Biol..

[6]  Petra Schneider,et al.  De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks , 2000, J. Comput. Aided Mol. Des..

[7]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[8]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[9]  Tudor I. Oprea,et al.  Is There a Difference between Leads and Drugs? A Historical Perspective , 2001, J. Chem. Inf. Comput. Sci..

[10]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences , 2003, J. Chem. Inf. Comput. Sci..

[11]  Lars Ruddigkeit,et al.  The enumeration of chemical space , 2012 .

[12]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[13]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[14]  Jacob D. Durrant,et al.  AutoClickChem: Click Chemistry in Silico , 2012, PLoS Comput. Biol..

[15]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[16]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[17]  Matthias Rarey,et al.  Exploring fragment spaces under multiple physicochemical constraints , 2007, J. Comput. Aided Mol. Des..

[18]  Matthias Rarey,et al.  MONA – Interactive manipulation of molecule collections , 2013, Journal of Cheminformatics.

[19]  Matthias Rarey,et al.  FlexNovo: Structure‐Based Searching in Large Fragment Spaces , 2006, ChemMedChem.

[20]  George Papadatos,et al.  SureChEMBL: a large-scale, chemically annotated patent document database , 2015, Nucleic Acids Res..

[21]  Valerie J. Gillet,et al.  Knowledge-Based Approach to de Novo Design Using Reaction Vectors , 2009, J. Chem. Inf. Model..

[22]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[23]  Matthias Rarey,et al.  NAOMI: On the Almost Trivial Task of Reading Molecules from Different File formats , 2011, J. Chem. Inf. Model..

[24]  Matthias Rarey,et al.  Similarity searching in large combinatorial chemistry spaces , 2001, J. Comput. Aided Mol. Des..

[25]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[26]  H. M. Vinkers,et al.  SYNOPSIS: SYNthesize and OPtimize System in Silico. , 2003, Journal of medicinal chemistry.

[27]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[28]  G. Bemis,et al.  BREED: Generating novel inhibitors through hybridization of known ligands. Application to CDK2, p38, and HIV protease. , 2004, Journal of medicinal chemistry.

[29]  Matthias Rarey,et al.  MONA 2: A Light Cheminformatics Platform for Interactive Compound Library Processing , 2015, J. Chem. Inf. Model..

[30]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[31]  Jean-Loup Faulon,et al.  OMG: Open Molecule Generator , 2012, Journal of Cheminformatics.

[32]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[33]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[34]  Matthias Rarey,et al.  Searching for Recursively Defined Generic Chemical Patterns in Nonenumerated Fragment Spaces , 2013, J. Chem. Inf. Model..

[35]  P. Wipf,et al.  Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. , 2013, Journal of the American Chemical Society.

[36]  Xiangqian Hu,et al.  A gradient-directed Monte Carlo approach to molecular design. , 2008, The Journal of chemical physics.

[37]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[38]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[39]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[40]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[41]  Diane Joseph-McCarthy,et al.  Fragment-Based Lead Discovery and Design , 2014, J. Chem. Inf. Model..

[42]  Matthias Rarey,et al.  Searching for Substructures in Fragment Spaces , 2012, J. Chem. Inf. Model..

[43]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[44]  Jean-Louis Reymond,et al.  Virtual exploration of the small-molecule chemical universe below 160 Daltons. , 2005, Angewandte Chemie.

[45]  E. Novellino,et al.  New Indole Tubulin Assembly Inhibitors Cause Stable Arrest of Mitotic Progression, Enhanced Stimulation of Natural Killer Cell Cytotoxic Activity, and Repression of Hedgehog-Dependent Cancer. , 2015, Journal of medicinal chemistry.

[46]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[47]  Peter Willett,et al.  Designing focused libraries using MoSELECT. , 2002, Journal of molecular graphics & modelling.

[48]  Gisbert Schneider,et al.  Flux (1): A Virtual Synthesis Scheme for Fragment-Based de Novo Design , 2006, J. Chem. Inf. Model..

[49]  Michael M. Hann,et al.  RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry , 1998, J. Chem. Inf. Comput. Sci..

[50]  Irwin D. Kuntz,et al.  A genetic algorithm for structure-based de novo design , 2001, J. Comput. Aided Mol. Des..

[51]  Tudor I. Oprea,et al.  The Design of Leadlike Combinatorial Libraries. , 1999, Angewandte Chemie.

[52]  Dominique Douguet,et al.  A genetic algorithm for the automated generation of small organic molecules: Drug design using an evolutionary algorithm , 2000, J. Comput. Aided Mol. Des..

[53]  Matthias Rarey,et al.  Recore: A Fast and Versatile Method for Scaffold Hopping Based on Small Molecule Crystal Structure Conformations , 2007, J. Chem. Inf. Model..

[54]  Bo Yu,et al.  Size estimation of chemical space: how big is it? , 2012, The Journal of pharmacy and pharmacology.

[55]  Melvin J. Yu Natural Product-Like Virtual Libraries: Recursive Atom-Based Enumeration , 2011, J. Chem. Inf. Model..