A Generative Adversarial Network Model for Disease Gene Prediction With RNA-seq Data

Deep learning models often need large amounts of training samples (thousands of training samples) to effectively extract hidden patterns in the data, thus achieving better results. However, in the field of brain-related disease, the omics data obtained by using advanced sequencing technology typically have much fewer patient samples (tens to hundreds of samples). Due to the small sample problem, statistical methods and intelligent machine learning methods have been unable to obtain a convergent gene set when prioritizing biomarkers. Furthermore, mathematical models designed for prioritizing biomarkers perform differently on different datasets. However, the architecture of the generative adversarial network (GAN) can address this bottleneck problem. Through the game between the generator and the discriminator, samples with similar distributions to that of samples in the training set can be generated by the generator, and the prediction accuracy and robustness of the discriminator could be significantly improved. Therefore, in this study, we designed a new generative adversarial network model with a denoising auto-encoder (DAE) as the generator and a multilayer perceptron (MLP) as the discriminator. The prediction residual error was backpropagated to the decoder part of the DAE, modifying the captured probability distribution. Based on this model, we further designed a framework to predict disease genes with RNA-seq data. The deep learning model improves the identification accuracy of disease genes over the-state-of-the-art approaches. An analysis of the experimental results has uncovered new disease-related genes and disease-associated pathways in the brain, which in turn have provided insight into the molecular mechanisms underlying disease phenotypes.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Alfonso Baldi,et al.  Human CHN1 Mutations Hyperactivate α2-Chimaerin and Cause Duane's Retraction Syndrome , 2008, Science.

[3]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Vishal M. Patel,et al.  Image De-Raining Using a Conditional Generative Adversarial Network , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[8]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[9]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[10]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[11]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[12]  Atabak Dehban,et al.  Denoising auto-encoders for learning of objects and tools affordances in continuous space , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Edward I. Altman,et al.  Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience) , 1994 .

[14]  O. Aalen,et al.  Further results on the non-parametric linear regression model in survival analysis. , 1993, Statistics in medicine.

[15]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[16]  Giovanni Coppola,et al.  Integrated genomics and proteomics to define huntingtin CAG length-dependent networks in HD Mice , 2016, Nature Neuroscience.

[17]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Michael Gill,et al.  Confirming RGS4 as a susceptibility gene for schizophrenia , 2004, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[21]  Marylin L. Vaughn,et al.  Direct knowledge discovery and interpretation from a multilayer perception network which performs low-back-pain classification , 1999, KDD 1999.

[22]  Jeffrey L. Gunter,et al.  Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks , 2018, SASHIMI@MICCAI.

[23]  Zhiyong Lu,et al.  Exploring Semi-supervised Variational Autoencoders for Biomedical Relation Extraction , 2019, Methods.

[24]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[25]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[28]  Feng Duan,et al.  Identify Huntington’s disease associated genes based on restricted Boltzmann machine with RNA-seq data , 2017, BMC Bioinformatics.

[29]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Michal Ozery-Flato,et al.  Adversarial Balancing for Causal Inference , 2018, ArXiv.

[32]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[35]  Diederik P. Kingma Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form , 2013, ArXiv.

[36]  Jane S. Paulsen,et al.  Huntington disease: natural history, biomarkers and prospects for therapeutics , 2014, Nature Reviews Neurology.

[37]  Paul D. McNicholas,et al.  Model-based clustering of microarray expression data via latent Gaussian mixture models , 2010, Bioinform..

[38]  Hayit Greenspan,et al.  Synthetic data augmentation using GAN for improved liver lesion classification , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[39]  N. Lodatko,et al.  An adaptive moment estimator of a parameter of a distribution constructed from observations with admixture , 2008 .

[40]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.