GenUI: interactive and extensible open source software platform for de novo molecular generation and cheminformatics

Many contemporary cheminformatics methods, including computer-aided de novo drug design, hold promise to significantly accelerate and reduce the cost of drug discovery. Thanks to this attractive outlook, the field has thrived and in the past few years has seen an especially significant growth, mainly due to the emergence of novel methods based on deep neural networks. This growth is also apparent in the development of novel de novo drug design methods with many new generative algorithms now available. However, widespread adoption of new generative techniques in the fields like medicinal chemistry or chemical biology is still lagging behind the most recent developments. Upon taking a closer look, this fact is not surprising since in order to successfully integrate the most recent de novo drug design methods in existing processes and pipelines, a close collaboration between diverse groups of experimental and theoretical scientists needs to be established. Therefore, to accelerate the adoption of both modern and traditional de novo molecular generators, we developed Generator User Interface (GenUI), a software platform that makes it possible to integrate molecular generators within a feature-rich graphical user interface that is easy to use by experts of diverse backgrounds. GenUI is implemented as a web service and its interfaces offer access to cheminformatics tools for data preprocessing, model building, molecule generation, and interactive chemical space visualization. Moreover, the platform is easy to extend with customizable frontend React.js components and backend Python extensions. GenUI is open source and a recently developed de novo molecular generator, DrugEx, was integrated as a proof of principle. In this work, we present the architecture and implementation details of GenUI and discuss how it can facilitate collaboration in the disparate communities interested in de novo molecular generation and computer-aided drug discovery.

[1]  Darren V. S. Green,et al.  BRADSHAW: a system for automated molecular design , 2019, Journal of Computer-Aided Molecular Design.

[2]  Xuhan Liu,et al.  Computational Approaches for De Novo Drug Design: Past, Present, and Future , 2021, Artificial Neural Networks, 3rd Edition.

[3]  Evgeny Putin,et al.  Chemistry42: An AI-based platform for de novo molecular design , 2021, ArXiv.

[4]  F. Svensson,et al.  Computational Chemistry on a Budget - Supporting Drug Discovery with Limited Resources. , 2020, Journal of medicinal chemistry.

[5]  Hao Zhu,et al.  Big Data and Artificial Intelligence Modeling for Drug Discovery. , 2020, Annual review of pharmacology and toxicology.

[6]  Ferran Sanz,et al.  Flame: an open source framework for model development, hosting, and usage in production environments , 2021, Journal of Cheminformatics.

[7]  Runling Wang,et al.  Identification of protein tyrosine phosphatase 1B (PTP1B) inhibitors through De Novo Evoluton, synthesis, biological evaluation and molecular dynamics simulation. , 2020, Biochemical and biophysical research communications.

[8]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[9]  T. Le,et al.  A Bright Future for Evolutionary Methods in Drug Design , 2015, ChemMedChem.

[10]  Igor V Tetko,et al.  Does 'Big Data' exist in medicinal chemistry, and if so, how can it be harnessed? , 2016, Future medicinal chemistry.

[11]  X. Xie,et al.  Generative chemistry: drug discovery with deep learning generative models , 2020, Journal of Molecular Modeling.

[12]  Luka Stojanović,et al.  Improved Scaffold Hopping in Ligand-Based Virtual Screening Using Neural Representation Learning , 2020, J. Chem. Inf. Model..

[13]  Gerard J. P. van Westen,et al.  An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor , 2018, Journal of Cheminformatics.

[14]  Yuedong Yang,et al.  Deep scaffold hopping with multimodal transformer neural networks , 2020, Journal of Cheminformatics.

[15]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[16]  Bo Yu,et al.  Size estimation of chemical space: how big is it? , 2012, The Journal of pharmacy and pharmacology.

[17]  Daniel Svozil,et al.  Probes &Drugs portal: an interactive, open data resource for chemical biology , 2017, Nature Methods.

[18]  Vsevolod A. Peshkov,et al.  cheML.io: an online database of ML-generated molecules , 2020, RSC advances.

[19]  M Pastor,et al.  Flame: an open source framework for model development, hosting, and usage in production environments , 2020, Journal of Cheminformatics.

[20]  A. Lavecchia Deep learning in drug discovery: opportunities, challenges and future prospects. , 2019, Drug discovery today.

[21]  Gisbert Schneider,et al.  Combining generative artificial intelligence and on-chip synthesis for de novo drug design , 2021, Science Advances.

[22]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[23]  Klavs F. Jensen,et al.  Autonomous discovery in the chemical sciences part I: Progress , 2020, Angewandte Chemie.

[24]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[25]  Gisbert Schneider,et al.  De Novo Design of Bioactive Small Molecules by Artificial Intelligence , 2018, Molecular informatics.

[26]  Daniel C. Elton,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[27]  Alberto Massarotti,et al.  The hitchhiker's guide to the chemical-biological galaxy. , 2018, Drug discovery today.

[28]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[29]  Xuezhong He,et al.  MoleGear: A Java-Based Platform for Evolutionary De Novo Molecular Design , 2019, Molecules.

[30]  Igor V. Tetko,et al.  BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry , 2016, Molecular informatics.

[31]  Jacob D. Durrant,et al.  AutoGrow4: an open-source genetic algorithm for de novo drug design and lead optimization , 2020, Journal of Cheminformatics.

[32]  Igor I Baskin,et al.  The power of deep learning to ligand-based novel drug discovery , 2020, Expert opinion on drug discovery.

[33]  Dong-Sheng Cao,et al.  Artificial intelligence facilitates drug design in the big data era , 2019, Chemometrics and Intelligent Laboratory Systems.

[34]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[35]  Ola Spjuth,et al.  Towards reproducible computational drug discovery , 2020, Journal of Cheminformatics.

[36]  Carlos Nieto-Draghi,et al.  Inverse‐QSPR for de novo Design: A Review , 2020, Molecular informatics.

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Claudia S Neuhaus,et al.  De novo design of anticancer peptides by ensemble artificial neural networks , 2019, Journal of Molecular Modeling.

[39]  Dmitry Vetrov,et al.  Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. , 2018, Molecular pharmaceutics.

[40]  Jan H. Jensen,et al.  A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space , 2018, Chemical science.

[41]  Daniel Svozil,et al.  Molpher: a software framework for systematic chemical space exploration , 2014, Journal of Cheminformatics.

[42]  Zois Boukouvalas,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[43]  Jan H Jensen A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space† †Electronic supplementary information (ESI) available: The codes used in this study can be found on GitHub: github.com/jensengroup/GB-GA/tree/v0.0 and github.com/jensengroup/GB-GM/tree , 2019, Chemical science.

[44]  Marcus Gastreich,et al.  The next level in chemical space navigation: going far beyond enumerable compound libraries. , 2019, Drug discovery today.

[45]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[46]  Alán Aspuru-Guzik,et al.  Autonomous Molecular Design: Then and Now. , 2019, ACS applied materials & interfaces.

[47]  K. Tsuda,et al.  Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies , 2018, ACS central science.

[48]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[49]  Jules Leguy,et al.  EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation , 2020, Journal of Cheminformatics.

[50]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[51]  Artem Cherkasov,et al.  QSAR without borders. , 2020, Chemical Society reviews.

[52]  吴树峰 从学徒到大师之路--读《 The Pragmatic Programmer, From Journeyman to Master》 , 2007 .

[53]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[54]  Gisbert Schneider,et al.  Automating drug discovery , 2017, Nature Reviews Drug Discovery.

[55]  George Papadatos,et al.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set , 2017, bioRxiv.

[56]  Yanli Wang,et al.  PubChem BioAssay: A Decade’s Development toward Open High-Throughput Screening Data Sharing , 2017, SLAS discovery : advancing life sciences R & D.

[57]  Connor W. Coley,et al.  Autonomous discovery in the chemical sciences part II: Outlook , 2020, Angewandte Chemie.

[58]  Evgeny Putin,et al.  Adversarial Threshold Neural Computer for Molecular de Novo Design. , 2018, Molecular pharmaceutics.

[59]  Tudor I. Oprea,et al.  Advancing Biological Understanding and Therapeutics Discovery with Small-Molecule Probes , 2015, Cell.

[60]  Alexandre Varnek,et al.  Estimation of the size of drug-like chemical space based on GDB-17 data , 2013, Journal of Computer-Aided Molecular Design.

[61]  Leroy Cronin,et al.  Designing Algorithms To Aid Discovery by Chemical Robots , 2018, ACS central science.

[62]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[63]  Andrew G. Leach,et al.  Chemists: AI is here, unite to get the benefits. , 2020, Journal of medicinal chemistry.

[64]  Ole Winther,et al.  Deep Generative Models for Molecular Science , 2018, Molecular informatics.

[65]  Volkan Atalay,et al.  Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases , 2018, Briefings Bioinform..

[66]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[67]  Connor W. Coley Defining and Exploring Chemical Spaces , 2020, Trends in Chemistry.

[68]  Harald C. Gall,et al.  Using Docker Containers to Improve Reproducibility in Software and Web Engineering Research , 2016, ICWE.

[69]  Xuanyi Li,et al.  Chemical space exploration based on recurrent neural networks: applications in discovering kinase inhibitors , 2020, Journal of Cheminformatics.

[70]  Gisbert Schneider,et al.  Automated De Novo Drug Design: Are We Nearly There Yet? , 2019, Angewandte Chemie.

[71]  Maria Liakata,et al.  Towards Robot Scientists for autonomous scientific discovery , 2010, Automated experimentation.

[72]  Jianfeng Pei,et al.  Deep learning for molecular generation. , 2019, Future medicinal chemistry.

[73]  Dominique Douguet,et al.  e-LEA3D: a computational-aided drug design web server , 2010, Nucleic Acids Res..

[74]  Koji Tsuda,et al.  Population-based de novo molecule generation, using grammatical evolution , 2018, 1804.02134.

[75]  Blaž Zupan,et al.  openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding , 2019, bioRxiv.

[76]  Andrew R. Leach,et al.  An open source chemical structure curation pipeline using RDKit , 2020, Journal of Cheminformatics.