Feedback GAN for DNA optimizes protein functions

Generative adversarial networks (GANs) represent an attractive and novel approach to generate realistic data, such as genes, proteins or drugs, in synthetic biology. Here, we apply GANs to generate synthetic DNA sequences encoding for proteins of variable length. We propose a novel feedback-loop architecture, feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyser. The proposed architecture also has the advantage that the analyser does not need to be differentiable. We apply the feedback-loop mechanism to two examples: generating synthetic genes coding for antimicrobial peptides, and optimizing synthetic genes for the secondary structure of their resulting peptides. A suite of metrics, calculated in silico, demonstrates that the GAN-generated proteins have desirable biophysical properties. The FBGAN architecture can also be used to optimize GAN-generated data points for useful properties in domains beyond genomics.Generative machine learning models are used in synthetic biology to find new structures such as DNA sequences, proteins and other macromolecules with applications in drug discovery, environmental treatment and manufacturing. Gupta and Zou propose and demonstrate in silico a feedback-loop architecture to optimize the output of a generative adversarial network that generates synthetic genes to produce ones specifically coding for antimicrobial peptides.

[1]  Faiza Hanif Waghu,et al.  CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides , 2015, Nucleic Acids Res..

[2]  Anne E Carpenter,et al.  CytoGAN: Generative Modeling of Cell Images , 2017, bioRxiv.

[3]  Daniel W. A. Buchan,et al.  Scalable web services for the PSIPRED Protein Analysis Workbench , 2013, Nucleic Acids Res..

[4]  Alexander M. Rush,et al.  Dilated Convolutions for Modeling Long-Distance Genomic Dependencies , 2017, bioRxiv.

[5]  Xia Li,et al.  APD3: the antimicrobial peptide database as a tool for research and education , 2015, Nucleic Acids Res..

[6]  M. Papagianni Ribosomally synthesized peptides with antimicrobial properties: biosynthesis, structure, function, and applications. , 2003, Biotechnology advances.

[7]  Petra Schneider,et al.  Generative Recurrent Networks for De Novo Drug Design , 2017, Molecular informatics.

[8]  Gisbert Schneider,et al.  Recurrent Neural Network Model for Constructive Peptide Design , 2018, J. Chem. Inf. Model..

[9]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  D. Baker,et al.  Global analysis of protein folding using massively parallel design, synthesis, and testing , 2017, Science.

[12]  Federico Vaggi,et al.  GANs for Biological Image Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  R. Othman,et al.  Computational identification of self‐inhibitory peptides from envelope proteins , 2012, Proteins.

[14]  György Abrusán,et al.  Alpha Helices Are More Robust to Mutations than Beta Strands , 2016, PLoS Comput. Biol..

[15]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[16]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[17]  Gisbert Schneider,et al.  modlAMP: Python for antimicrobial peptides , 2017, Bioinform..

[18]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[19]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.