Improving protein function prediction with synthetic feature samples created by generative adversarial networks

Protein function prediction is a challenging but important task in bioinformatics. Many prediction methods have been developed, but are still limited by the bottleneck on training sample quantity. Therefore, it is valuable to develop a data augmentation method that can generate high-quality synthetic samples to further improve the accuracy of prediction methods. In this work, we propose a novel generative adversarial networks-based method, namely FFPred-GAN, to accurately learn the high-dimensional distributions of protein sequence-based biophysical features and also generate high-quality synthetic protein feature samples. The experimental results suggest that the synthetic protein feature samples are successful in improving the prediction accuracy for all three domains of the Gene Ontology through augmentation of the original training protein feature samples.

[1]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[2]  Hayit Greenspan,et al.  GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification , 2018, Neurocomputing.

[3]  Concetto Spampinato,et al.  Semi Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Silvio Savarese,et al.  Adversarial Feature Augmentation for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Xiaowo Wang,et al.  Synthetic Promoter Design in Escherichia coli based on Generative Adversarial Network , 2019 .

[7]  David T Jones,et al.  Computational Methods for Annotation Transfers from Sequence. , 2016, Methods in molecular biology.

[8]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  David Lopez-Paz,et al.  Revisiting Classifier Two-Sample Tests , 2016, ICLR.

[11]  Rui Fa,et al.  Predicting human protein function with multi-task deep neural networks , 2018, bioRxiv.

[12]  Pierre Machart,et al.  Realistic in silico generation and augmentation of single cell RNA-seq data using Generative Adversarial Neural Networks , 2018, bioRxiv.

[13]  F. McCoy,et al.  Janus-faced PIDD: a sensor for DNA damage-induced cell death or survival? , 2012, Molecular cell.

[14]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[15]  Heng Huang,et al.  Semi-Supervised Generative Adversarial Network for Gene Expression Inference , 2018, KDD.

[16]  James Zou,et al.  Feedback GAN for DNA optimizes protein functions , 2019, Nature Machine Intelligence.

[17]  Gregory D. Hager,et al.  Adversarial deep structured nets for mass segmentation from mammograms , 2017, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[18]  Nicholas M. Luscombe,et al.  Generative adversarial networks simulate gene expression and predict perturbations in single cells , 2018, bioRxiv.

[19]  Heng Huang,et al.  Conditional generative adversarial network for gene expression inference , 2018, Bioinform..

[20]  Luca Ambrogioni,et al.  Generative adversarial networks for reconstructing natural images from brain activity , 2017, NeuroImage.

[21]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[22]  Damiano Piovesan,et al.  FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences , 2013, PloS one.

[23]  Lin Yang,et al.  Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[25]  Guang Yang,et al.  DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction , 2018, IEEE Transactions on Medical Imaging.

[26]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Christine A. Orengo,et al.  Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster , 2017, PLoS Comput. Biol..

[28]  Nicholas M. Luscombe,et al.  Generative adversarial networks simulate gene expression and predict perturbations in single cells , 2018, bioRxiv.

[29]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[30]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[31]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[33]  Zengchang Qin,et al.  Emotion Classification with Data Augmentation Using Generative Adversarial Networks , 2018, PAKDD.

[34]  Rui Fa,et al.  Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks , 2018 .

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.