Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models

In unsupervised data generation tasks, besides the generation of a sample based on previous observations, one would often like to give hints to the model in order to bias the generation towards desirable metrics. We propose a method that combines Generative Adversarial Networks (GANs) and reinforcement learning (RL) in order to accomplish exactly that. While RL biases the data generation process towards arbitrary metrics, the GAN component of the reward function ensures that the model still remembers information learned from data. We build upon previous results that incorporated GANs and RL in order to generate sequence data and test this model in several settings for the generation of molecules encoded as text sequences (SMILES) and in the context of music generation, showing for each case that we can effectively bias the generation process towards desired metrics.

[1]  José Manuel Iñesta Quereda,et al.  Melody Recognition with Learned Edit Distances , 2008, SSPR/SPR.

[2]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[3]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[4]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[5]  Alán Aspuru-Guzik,et al.  A redox-flow battery with an alloxazine-based organic electrolyte , 2016, Nature Energy.

[6]  Alán Aspuru-Guzik,et al.  Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry – the Harvard Clean Energy Project , 2014 .

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  David Pfau,et al.  Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[9]  Douglas Eck,et al.  Tuning Recurrent Neural Networks with Reinforcement Learning , 2016, ICLR.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[12]  S. Papson “Model” , 1981 .

[13]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[14]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[15]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[16]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[17]  Sungroh Yoon,et al.  A SeqGAN for Polyphonic Music Generation , 2017, ArXiv.

[18]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[19]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[20]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21]  Sungroh Yoon,et al.  Polyphonic Music Generation with Sequence Generative Adversarial Networks , 2017 .

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[24]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[25]  D. Prowe Berlin , 1855, Journal of public health, and sanitary review.

[26]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[27]  Yoshua Bengio,et al.  Maximum-Likelihood Augmented Discrete Generative Adversarial Networks , 2017, ArXiv.

[28]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[29]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[30]  Yoshua Bengio,et al.  Boundary-Seeking Generative Adversarial Networks , 2017, ICLR 2017.

[31]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[32]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[33]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[34]  Richard E. Turner,et al.  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control , 2016, ICML.

[35]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[36]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[37]  Chang-Tien Lu,et al.  Learning to Fuse Music Genres with Generative Adversarial Dual Learning , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[38]  Eric J. Martin,et al.  In silico generation of novel, drug-like chemical matter using the LSTM neural network , 2017, ArXiv.

[39]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[40]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.