A Comparison of Generative Models for Sequence Design

Recent work has explored variants of the cross entropy method for black-box optimization of discrete structures such as DNA sequences. Using multiple rounds of calls to the black-box function, a generative sequence model is trained to place high probability on sequences that achieve high function values. These works employ sophisticated deep generative models, which, however, have high sample complexity and require extensive tuning. On the other hand, simple generative models, such as hidden Markov models, have achieved widespread success in computational biology applications. In response, we evaluate the performance of simple generative models when used within the cross entropy method. We find that simple generative models are competitive with more sophisticated models on two synthetic optimization tasks inspired by biological sequence design.

[1]  Kevin Murphy,et al.  A view of estimation of distribution algorithms through the lens of expectation-maximization , 2019, GECCO Companion.

[2]  James Zou,et al.  Feedback GAN for DNA optimizes protein functions , 2019, Nature Machine Intelligence.

[3]  Jennifer Listgarten,et al.  Conditioning by adaptive sampling for robust design , 2019, ICML.

[4]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[5]  Song Han,et al.  Learning to Design Circuits , 2018, ArXiv.

[6]  Jennifer Listgarten,et al.  Design by adaptive sampling , 2018, ArXiv.

[7]  G. Seelig,et al.  Human 5′ UTR design and variant effect prediction from a massively parallel translation assay , 2018, bioRxiv.

[8]  John C. Duchi,et al.  Derivative Free Optimization Via Repeated Classification , 2018, AISTATS.

[9]  Mohamed Ahmed,et al.  Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design , 2018, ICLR.

[10]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[11]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[12]  Jianxin Wu Hidden Markov model , 2018 .

[13]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[14]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[15]  Dmitry Chudakov,et al.  Local fitness landscape of the green fluorescent protein , 2016, Nature.

[16]  Jaie C. Woodard,et al.  Survey of variation in human transcription factors reveals prevalent DNA binding changes , 2016, Science.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Concha Bielza,et al.  A review of estimation of distribution algorithms in bioinformatics , 2008, BioData Mining.

[19]  F. Arnold Design by Directed Evolution , 1998 .

[20]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[21]  E. Shakhnovich,et al.  A new approach to the design of stable proteins. , 1993, Protein engineering.

[22]  E. Shakhnovich,et al.  Engineering of stable and fast-folding sequences of model proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.