DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks

Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the ‘biofilm formation process’ in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zhangxin Chen,et al.  ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network , 2017, Molecules.

[3]  Xueliang Liu,et al.  Deep Recurrent Neural Network for Protein Function Prediction from Sequence , 2017, bioRxiv.

[4]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[5]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[6]  Maria Jesus Martin,et al.  UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB , 2016, Bioinform..

[7]  Rengul Cetin-Atalay,et al.  Multi-task Deep Neural Networks in Automated Protein Function Prediction , 2017, 1705.04802.

[8]  Rengül Çetin-Atalay,et al.  Subsequence-based feature map for protein function classification , 2008, Comput. Biol. Chem..

[9]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[10]  Patrick van der Smagt,et al.  Introduction to neural networks , 1995, The Lancet.

[11]  Gerard J. P. van Westen,et al.  Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets , 2013, Journal of Cheminformatics.

[12]  C. Chang Surface Sensing for Biofilm Formation in Pseudomonas aeruginosa , 2018, Front. Microbiol..

[13]  T. Wood,et al.  Connecting Quorum Sensing, c-di-GMP, Pel Polysaccharide, and Biofilm Formation in Pseudomonas aeruginosa through Tyrosine Phosphatase TpbA (PA3885) , 2009, PLoS pathogens.

[14]  Rui Fa,et al.  Predicting human protein function with multi-task deep neural networks , 2018, bioRxiv.

[15]  Richard Bonneau,et al.  deepNF: deep network fusion for protein function prediction , 2017, bioRxiv.

[16]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[17]  Pierre Baldi,et al.  Deep autoencoder neural networks for gene ontology annotation predictions , 2014, BCB.

[18]  Horacio Emilio Pérez Sánchez,et al.  Virtual Screening: A Challenge for Deep Learning , 2016, PACBB.

[19]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[20]  J. Reymond,et al.  Corrigendum: Anti-Microbial Dendrimers against Multidrug-Resistant P. aeruginosa Enhance the Angiogenic Effect of Biological Burn-wound Bandages , 2016, Scientific Reports.

[21]  Jens Krüger,et al.  Development of a pharmacorphore model for pharmacological chaperones targeting mutant trafficking-deficient CNG channels , 2013, Journal of Cheminformatics.

[22]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[23]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[24]  Richa Gupta,et al.  Division of labor among Mycobacterium smegmatis RNase H enzymes: RNase H1 activity of RnhA or RnhC is essential for growth whereas RnhB and RnhA guard against killing by hydrogen peroxide in stationary phase , 2016, Nucleic acids research.

[25]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[26]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[27]  Igor V Tetko,et al.  A renaissance of neural networks in drug discovery , 2016, Expert opinion on drug discovery.

[28]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[29]  K. Miura,et al.  Functional Analysis of the Leading Malaria Vaccine Candidate AMA-1 Reveals an Essential Role for the Cytoplasmic Domain in the Invasion Process , 2009, PLoS pathogens.

[30]  Edward W. Lowe,et al.  Computational Methods in Drug Discovery , 2014, Pharmacological Reviews.

[31]  Mark Gomelsky,et al.  Cyclic Diguanylate Is a Ubiquitous Signaling Molecule in Bacteria: Insights into Biochemistry of the GGDEF Protein Domain , 2005, Journal of bacteriology.

[32]  Rasiah Loganantharaj,et al.  Towards recognition of protein function based on its structure using deep convolutional networks , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[33]  Arvind Kumar Tiwari,et al.  A Survey of Computational Intelligence Techniques in Protein Function Prediction , 2014, International journal of proteomics.

[34]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[35]  Rabie Saidi,et al.  Large‐scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants , 2018, Proteins.

[36]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[37]  Michael J. E. Sternberg,et al.  CombFunc: predicting protein function using heterogeneous data sources , 2012, Nucleic Acids Res..

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[39]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.

[40]  Vince Grolmusz,et al.  SECLAF: a webserver and deep neural network design tool for hierarchical biological sequence classification , 2018, Bioinform..

[41]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[42]  Evan Bolton,et al.  PubChem3D: conformer ensemble accuracy , 2013, Journal of Cheminformatics.

[43]  Guoxian Yu,et al.  Protein Function Prediction Using Deep Restricted Boltzmann Machines , 2017, BioMed research international.

[44]  Miguel Rocha,et al.  10th International Conference on Practical Applications of Computational Biology & Bioinformatics , 2016 .

[45]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[47]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[48]  Hannah Currant,et al.  FFPred 3: feature-based function prediction for all Gene Ontology domains , 2016, Scientific Reports.

[49]  Liisa Holm,et al.  PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment , 2015, Bioinform..

[50]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[51]  Weidong Tian,et al.  GoFDR: A sequence alignment based method for predicting protein functions. , 2016, Methods.

[52]  J. M. Dow,et al.  HD-GYP domain proteins regulate biofilm formation and virulence in Pseudomonas aeruginosa. , 2009, Environmental microbiology.

[53]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[54]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[55]  Dai Lin,et al.  Antidiabetic Micro-/Nanoaggregates from Ge-Gen-Qin-Lian-Tang Decoction Increase Absorption of Baicalin and Cellular Antioxidant Activity In Vitro , 2017, BioMed research international.

[56]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[57]  F. Kong,et al.  Moraxella catarrhalis Macrolide-Resistant Isolates Are Highly Concentrated in Two MLST Clonal Complexes -CCN10 and CC363 , 2017, Front. Microbiol..