Adversarial generation of gene expression data

The problem of reverse engineering gene regulatory networks from high-throughput expression data is one of the biggest challenges in bioinformatics. In order to benchmark network inference algorithms, simulators of well-characterized expression datasets are often required. However, existing simulators have been criticized because they fail to emulate key properties of gene expression data. In this study we address two problems. First, we propose mechanisms to faithfully assess the realism of a synthetic gene expression dataset. Second, we design an adversarial simulator of expression data, gGAN, based on a Generative Adversarial Network. We show that our model outperforms existing simulators by a large margin, achieving realism scores that are up to 17 times higher than those of GeneNetWeaver and SynTReN. More importantly, our results show that gGAN is, to our best knowledge, the first simulator that passes the Turing test for gene expression data proposed by Maier et al. (2013).

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[3]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[4]  Ralf Zimmer,et al.  A Turing test for artificial expression data , 2013, Bioinform..

[5]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[6]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[7]  G. Cooper Cells As Experimental Models , 2000 .

[8]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[9]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[10]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[13]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[14]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  E. Boersma,et al.  Prevention of Catheter-Related Bacteremia with a Daily Ethanol Lock in Patients with Tunnelled Catheters: A Randomized, Placebo-Controlled Trial , 2010, PloS one.

[17]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[18]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[19]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[20]  Kevin Y. Yip,et al.  Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data , 2010, PloS one.

[21]  S. Busby,et al.  Global regulators of transcription in Escherichia coli: mechanisms of action and methods for study. , 2008, Advances in applied microbiology.

[22]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..