Virtual Chemical Libraries.

Advances in computer processing speed and storage capacity have enabled researchers to generate virtual chemical libraries containing billions of molecules. While these numbers appear large, they are only a small fraction of the number of organic molecules that could potentially be synthesized. This review provides an overview of recent advances in the generation and use of virtual chemical libraries in medicinal chemistry. We also consider the practical implications of these libraries in drug discovery programs and highlight a number of current and future challenges.

[1]  Héléna A. Gaspar,et al.  Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure‐Activity Modeling and Dataset Comparison , 2012, Molecular informatics.

[2]  Dragos Horvath,et al.  Mapping of the Available Chemical Space versus the Chemical Universe of Lead‐Like Compounds , 2018, ChemMedChem.

[3]  Evgeny Putin,et al.  Adversarial Threshold Neural Computer for Molecular de Novo Design. , 2018, Molecular pharmaceutics.

[4]  Petra Mutzel,et al.  CHIPMUNK: A Virtual Synthesizable Small‐Molecule Library for Medicinal Chemistry, Exploitable for Protein–Protein Interaction Modulators , 2018, ChemMedChem.

[5]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[6]  Jonas Boström,et al.  Analysis of Past and Present Synthetic Methodologies on Medicinal Chemistry: Where Have All the New Reactions Gone? , 2016, Journal of medicinal chemistry.

[7]  J. Reymond The chemical space project. , 2015, Accounts of chemical research.

[8]  Jürgen Bajorath,et al.  Identification of a Preferred Set of Molecular Descriptors for Compound Classification Based on Principal Component Analysis , 1999, J. Chem. Inf. Comput. Sci..

[9]  G. Klebe Virtual ligand screening: strategies, perspectives and limitations , 2006, Drug Discovery Today.

[10]  Daniel M. Lowe,et al.  Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists' Bread and Butter. , 2016, Journal of medicinal chemistry.

[11]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[12]  Connor W. Coley,et al.  Machine Learning in Computer-Aided Synthesis Planning. , 2018, Accounts of chemical research.

[13]  P Schneider,et al.  Self-organizing maps in drug discovery: compound library design, scaffold-hopping, repurposing. , 2009, Current medicinal chemistry.

[14]  Lorenz C. Blum,et al.  Classification of Organic Molecules by Molecular Quantum Numbers , 2009, ChemMedChem.

[15]  Yanli Wang,et al.  Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review , 2012, The AAPS Journal.

[16]  Joe Zhongxiang Zhou,et al.  Pfizer Global Virtual Library (PGVL): a chemistry design tool powered by experimentally validated parallel synthesis information. , 2012, ACS combinatorial science.

[17]  Markus Hartenfeller,et al.  A Collection of Robust Organic Synthesis Reactions for In Silico Molecule Design , 2011, J. Chem. Inf. Model..

[18]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[19]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[20]  D. Bertrand,et al.  Synthesis and nicotinic receptor activity of chemical space analogues of N-(3R)-1-azabicyclo[2.2.2]oct-3-yl-4-chlorobenzamide (PNU-282,987) and 1,4-diazabicyclo[3.2.2]nonane-4-carboxylic acid 4-bromophenyl ester (SSR180711). , 2012, Journal of medicinal chemistry.

[21]  Lars Richter,et al.  Medicinal chemistry in the era of big data. , 2015, Drug discovery today. Technologies.

[22]  P. Wipf,et al.  Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. , 2013, Journal of the American Chemical Society.

[23]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[24]  Jean-Louis Reymond,et al.  FUn: a framework for interactive visualizations of large, high‐dimensional datasets on the web , 2018, Bioinform..

[25]  Ross McGuire,et al.  Data-driven medicinal chemistry in the era of big data. , 2014, Drug discovery today.

[26]  Alexandre Varnek,et al.  Estimation of the size of drug-like chemical space based on GDB-17 data , 2013, Journal of Computer-Aided Molecular Design.

[27]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[28]  Johann Gasteiger,et al.  Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[29]  Aleksejs Kontijevskis Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks , 2017, J. Chem. Inf. Model..

[30]  D. Bertrand,et al.  Exploring α7-Nicotinic Receptor Ligand Diversity by Scaffold Enumeration from the Chemical Universe Database GDB. , 2010, ACS medicinal chemistry letters.

[31]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[32]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[33]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[34]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[35]  Michael M. Hann,et al.  RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry , 1998, J. Chem. Inf. Comput. Sci..

[36]  J C Baber,et al.  Predicting synthetic accessibility: application in drug discovery and development. , 2004, Mini reviews in medicinal chemistry.

[37]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[38]  Igor V Tetko,et al.  Does 'Big Data' exist in medicinal chemistry, and if so, how can it be harnessed? , 2016, Future medicinal chemistry.

[39]  J. Reymond,et al.  Chemical Space Travel , 2007, ChemMedChem.

[40]  Peter Ertl,et al.  Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups , 2003, J. Chem. Inf. Comput. Sci..

[41]  Pascal Bonnet,et al.  Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. , 2012, European journal of medicinal chemistry.

[42]  Jean-Louis Reymond,et al.  Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery , 2007, J. Chem. Inf. Model..

[43]  Jean-Louis Reymond,et al.  3-(aminomethyl)piperazine-2,5-dione as a novel NMDA glycine site inhibitor from the chemical universe database GDB. , 2009, Bioorganic & medicinal chemistry letters.

[44]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[45]  Gisbert Schneider,et al.  De Novo Design of Bioactive Small Molecules by Artificial Intelligence , 2018, Molecular informatics.

[46]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[47]  L Xue,et al.  Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. , 2000, Combinatorial chemistry & high throughput screening.

[48]  D. Bertrand,et al.  Discovery of NMDA Glycine Site Inhibitors from the Chemical Universe Database GDB , 2008, ChemMedChem.

[49]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[50]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[51]  Petra Mutzel,et al.  StruClus: Scalable Structural Graph Set Clustering with Representative Sampling , 2017, ADMA.

[52]  Hiroshi Yamashita,et al.  A Quantitative Approach to the Estimation of Chemical Space from a Given Geometry by the Combination of Atomic Species , 2007 .

[53]  George Papadatos,et al.  SureChEMBL: a large-scale, chemically annotated patent document database , 2015, Nucleic Acids Res..

[54]  Gisbert Schneider,et al.  Generative Models for Artificially‐intelligent Molecular Design , 2018, Molecular informatics.

[55]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[56]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[57]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[58]  Ji-Bo Wang,et al.  The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space , 2016, J. Chem. Inf. Model..

[59]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[60]  J. Bajorath,et al.  State-of-the-art in ligand-based virtual screening. , 2011, Drug discovery today.

[61]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[62]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[63]  Aixia Yan Application of self-organizing maps in compounds pattern recognition and combinatorial library design. , 2006, Combinatorial chemistry & high throughput screening.

[64]  Florent Chevillard,et al.  SCUBIDOO: A Large yet Screenable and Easily Searchable Database of Computationally Created Chemical Compounds Optimized toward High Likelihood of Synthetic Tractability , 2015, J. Chem. Inf. Model..

[65]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[66]  Lin-Li Li,et al.  RASA: A Rapid Retrosynthesis-Based Scoring Method for the Assessment of Synthetic Accessibility of Drug-like Molecules , 2011, J. Chem. Inf. Model..

[67]  Anthony Wood,et al.  Organic synthesis provides opportunities to transform drug discovery , 2018, Nature Chemistry.