De novo generation of hit-like molecules from gene expression signatures using artificial intelligence

Finding new molecules with a desired biological activity is an extremely difficult task. In this context, artificial intelligence and generative models have been used for molecular de novo design and compound optimization. Herein, we report a generative model that bridges systems biology and molecular design, conditioning a generative adversarial network with transcriptomic data. By doing so, we can automatically design molecules that have a high probability to induce a desired transcriptomic profile. As long as the gene expression signature of the desired state is provided, this model is able to design active-like molecules for desired targets without any previous target annotation of the training compounds. Molecules designed by this model are more similar to active compounds than the ones identified by similarity of gene expression signatures. Overall, this method represents an alternative approach to bridge chemistry and biology in the long and difficult road of drug discovery. High quality hit identification remains a considerable challenge in de novo drug design. Here, the authors train a generative adversarial network with transcriptome profiles induced by a large set of compounds, enabling it to design molecules that are likely to induce desired expression profiles.

[1]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[2]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  J. Ludden,et al.  Principles and Practice , 1998, Community-based Learning and Social Movements.

[5]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[6]  Hugo Kubinyi,et al.  Similarity and Dissimilarity: A Medicinal Chemist’s View , 2002 .

[7]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[8]  Hans-Joachim Böhm,et al.  A guide to drug discovery: Hit and lead generation: beyond high-throughput screening , 2003, Nature Reviews Drug Discovery.

[9]  C. Dobson Chemical space and biology , 2004, Nature.

[10]  T. Golub,et al.  Gene expression signature-based chemical genomic prediction identifies a novel class of HSP90 pathway modulators. , 2006, Cancer cell.

[11]  T. Golub,et al.  Gene expression-based chemical genomics identifies rapamycin as a modulator of MCL1 and glucocorticoid resistance. , 2006, Cancer cell.

[12]  Jörg D. Wichard,et al.  Topology Preserving Neural Networks for Peptide Design in Drug Discovery , 2009, CIBB.

[13]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[14]  Claudio N. Cavasotto,et al.  High-throughput and in silico screenings in drug discovery , 2009, Expert opinion on drug discovery.

[15]  Jérôme Hert,et al.  Quantifying Biogenic Bias in Screening Libraries , 2009, Nature chemical biology.

[16]  David J. Wild,et al.  Grand challenges for cheminformatics , 2009, J. Cheminformatics.

[17]  Jörg D. Wichard,et al.  Computer Assisted Peptide Design and Optimization with Topology Preserving Neural Networks , 2010, ICAISC.

[18]  Gisbert Schneider,et al.  Virtual screening: an endless staircase? , 2010, Nature Reviews Drug Discovery.

[19]  Olivier Elemento,et al.  Using transcriptome sequencing to identify mechanisms of drug action and resistance , 2011, Nature chemical biology.

[20]  S. Istrail,et al.  Computational Intelligence Methods for Bioinformatics and Biostatistics , 2012, Lecture Notes in Computer Science.

[21]  Ronald Kühne,et al.  Molecular Evolution of a Peptide GPCR Ligand Driven by Artificial Neural Networks , 2012, PloS one.

[22]  F. Iorio,et al.  Transcriptional data: a new gateway to drug repositioning? , 2013, Drug discovery today.

[23]  Shuxing Zhang,et al.  Structure-based de novo drug design , 2013 .

[24]  Rommie E. Amaro,et al.  De Novo Design by Fragment Growing and Docking , 2013 .

[25]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Peter Willett,et al.  The Calculation of Molecular Structural Similarity: Principles and Practice , 2014, Molecular informatics.

[28]  Jürgen Bajorath,et al.  Activity-relevant similarity values for fingerprints and implications for similarity searching , 2016, F1000Research.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jürgen Bajorath,et al.  Activity-relevant similarity values for fingerprints and implications for similarity searching , 2016, F1000Research.

[31]  Sergey Plis,et al.  Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data. , 2016, Molecular pharmaceutics.

[32]  M. Ceccarelli,et al.  Pesticide toxicogenomics across scales: in vitro transcriptome predicts mechanisms and outcomes of exposure in vivo , 2016, Scientific Reports.

[33]  Petra Schneider,et al.  De Novo Design at the Edge of Chaos. , 2016, Journal of medicinal chemistry.

[34]  Marc Hafner,et al.  L1000CDS2: LINCS L1000 characteristic direction signatures search engine , 2016, npj Systems Biology and Applications.

[35]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[36]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[37]  Masataka Kuroda,et al.  A novel descriptor based on atom-pair properties , 2017, Journal of Cheminformatics.

[38]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[39]  Shuohang Wang,et al.  A Compare-Aggregate Model for Matching Text Sequences , 2016, ICLR.

[40]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles , 2017, Cell.

[41]  Yoshihiro Yamanishi,et al.  Elucidating the modes of action for bioactive compounds in a cell-specific manner by large-scale chemically-induced transcriptomics , 2017, Scientific Reports.

[42]  Yong Zhou,et al.  Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information , 2017, Journal of Cheminformatics.

[43]  Eric J. Martin,et al.  In silico generation of novel, drug-like chemical matter using the LSTM neural network , 2017, ArXiv.

[44]  Jacob K. Asiedu,et al.  The Drug Repurposing Hub: a next-generation drug library and information resource , 2017, Nature Medicine.

[45]  Thierry Kogej,et al.  Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ArXiv.

[46]  Lars Carlsson,et al.  ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics , 2017, Journal of Cheminformatics.

[47]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[48]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[49]  Krister Wennerberg,et al.  A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury , 2017, Nature Communications.

[50]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[52]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[53]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[54]  Stephen M. Schwartz,et al.  Faculty Opinions recommendation of A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. , 2018, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[55]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[56]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[57]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[58]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Hugo Ceulemans,et al.  High-Throughput Gene Expression Profiles to Define Drug Similarity and Predict Compound Activity. , 2018, Assay and drug development technologies.

[60]  Gisbert Schneider,et al.  De Novo Design of Bioactive Small Molecules by Artificial Intelligence , 2018, Molecular informatics.

[61]  K. Goldstein,et al.  Toxicogenomic module associations with pathogenesis: a network-based approach to understanding drug toxicity , 2017, The Pharmacogenomics Journal.

[62]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[63]  Gerard J. P. van Westen,et al.  Data-driven approaches used for compound library design, hit triage and bioactivity modeling in high-throughput screening , 2016, Briefings Bioinform..

[64]  Joshua M. Dempster,et al.  Genetic and transcriptional evolution alters cancer cell line drug response , 2018, Nature.

[65]  Xiaogang Wang,et al.  StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations† †Electronic supplementary information (ESI) available: Detailed information regarding the final model architecture, hyperparameter grid, results and computation time. See DOI: 10.1039/c8sc04 , 2018, Chemical science.

[67]  Zois Boukouvalas,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[68]  Frank Noé,et al.  Efficient multi-objective molecular optimization in a continuous latent space† †Electronic supplementary information (ESI) available: Details of the desirability scaling functions, high resolution figures and detailed results of the GuacaMol benchmark. See DOI: 10.1039/c9sc01928f , 2019, Chemical science.