Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery

The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on the above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated with CNN. Finally, a novel strategy that collectively considers the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.

[1]  J. Duncan,et al.  Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI , 2019, European Radiology.

[2]  Xing Chen,et al.  LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction , 2017, PLoS Comput. Biol..

[3]  Gabriel Waksman,et al.  Secretion systems in Gram-negative bacteria: structural and mechanistic insights , 2015, Nature Reviews Microbiology.

[4]  Klaus-Robert Müller,et al.  Introduction to machine learning for brain imaging , 2011, NeuroImage.

[5]  Yu Wang,et al.  Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini , 2017, Journal of Computer-Aided Molecular Design.

[6]  Thomas Kroj,et al.  Structure Analysis Uncovers a Highly Diverse but Structurally Conserved Effector Family in Phytopathogenic Fungi , 2015, PLoS pathogens.

[7]  Xing Chen,et al.  MicroRNAs and complex diseases: from experimental results to computational models , 2019, Briefings Bioinform..

[8]  Bo Li,et al.  NOREVA: normalization and evaluation of MS-based metabolomics data , 2017, Nucleic Acids Res..

[9]  Chuntao Yin,et al.  Effectors from Wheat Rust Fungi Suppress Multiple Plant Defense Responses. , 2017, Phytopathology.

[10]  Junchi Yan,et al.  Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks , 2017, BMC Genomics.

[11]  Feng Zhu,et al.  Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate , 2018, International journal of molecular sciences.

[12]  Tal Pupko,et al.  Identification of novel Xanthomonas euvesicatoria type III effector proteins by a machine-learning approach. , 2016, Molecular plant pathology.

[13]  Dong-Qing Wei,et al.  PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method , 2018, Front. Microbiol..

[14]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[15]  Gabriel Waksman,et al.  Type IV secretion in Gram‐negative and Gram‐positive bacteria , 2018, Current Topics in Microbiology and Immunology.

[16]  M. D. da Costa,et al.  Molecular evolution of key genes for type II secretion in Legionella pneumophila. , 2012, Environmental microbiology.

[17]  Yan Xu,et al.  A deep learning method to more accurately recall known lysine acetylation sites , 2019, BMC Bioinformatics.

[18]  Jian Huang,et al.  Prediction and identification of the effectors of heterotrimeric G proteins in rice (Oryza sativa L.) , 2016, Briefings Bioinform..

[19]  Wen-Lian Hsu,et al.  Predicting RNA-binding sites of proteins using support vector machines and evolutionary information , 2008, BMC Bioinformatics.

[20]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[21]  Shuang Li,et al.  SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity , 2016, PloS one.

[22]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[23]  Geoffrey I. Webb,et al.  Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches , 2017, Briefings Bioinform..

[24]  Yuichiro Miyamoto,et al.  Application of deep learning to the classification of images from colposcopy , 2018, Oncology letters.

[25]  Yejun Wang,et al.  Prediction of bacterial type IV secreted effectors by C-terminal features , 2014, BMC Genomics.

[26]  D. Zamboni,et al.  Inhibition of inflammasome activation by Coxiella burnetii type IV secretion system effector IcaA , 2015, Nature Communications.

[27]  Hong-Bin Shen,et al.  Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks , 2017 .

[28]  Xing Chen,et al.  MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction , 2018, PLoS Comput. Biol..

[29]  Jayavel Sridhar,et al.  Computational prediction of secretion systems and secretomes of Brucella: identification of novel type IV effectors and their interaction with the host. , 2016, Molecular bioSystems.

[30]  Lingyun Zou,et al.  Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles , 2013, Bioinform..

[31]  Giancarlo Ferrigno,et al.  FCNN-based axon segmentation for convection-enhanced delivery optimization , 2019, International Journal of Computer Assisted Radiology and Surgery.

[32]  P. Hahn,et al.  Overfitting and Use of Mismatched Cohorts in Deep Learning Models: Preventable Design Limitations. , 2018, American journal of respiratory and critical care medicine.

[33]  Xia Sun,et al.  Drug and Nondrug Classification Based on Deep Learning with Various Feature Selection Strategies , 2018 .

[34]  Stephanie R. Shames,et al.  Multiple Legionella pneumophila effector virulence phenotypes revealed through high-throughput analysis of targeted mutant libraries , 2017, Proceedings of the National Academy of Sciences.

[35]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[36]  Jana Sperschneider,et al.  Improved prediction of fungal effector proteins from secretomes with EffectorP 2.0 , 2018, bioRxiv.

[37]  Humira Sonah,et al.  Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges , 2016, Front. Plant Sci..

[38]  Irene Vacca Bacterial pathogenesis: Activating Helicobacter effector delivery , 2017, Nature Reviews Microbiology.

[39]  Seokjun Seo,et al.  DeepFam: deep learning based alignment-free method for protein family modeling and prediction , 2018, Bioinform..

[40]  Y. Z. Chen,et al.  Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach , 2004, Nucleic acids research.

[41]  Martti T Tammi,et al.  What Are Next Generation Innovative Therapeutic Targets? Clues from Genetic, Structural, Physicochemical, and Systems Profiles of Successful Targets , 2009, Journal of Pharmacology and Experimental Therapeutics.

[42]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[43]  Lei Wang,et al.  BNPMDA: Bipartite Network Projection for MiRNA–Disease Association prediction , 2018, Bioinform..

[44]  Geoffrey I. Webb,et al.  Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI , 2016, Briefings Bioinform..

[45]  Chia-Hung Kao,et al.  Development of a Prediction Model for Colorectal Cancer among Patients with Type 2 Diabetes Mellitus Using a Deep Neural Network , 2018, Journal of clinical medicine.

[46]  Sébastien Duplessis,et al.  Effector-Mining in the Poplar Rust Fungus Melampsora larici-populina Secretome , 2015, Front. Plant Sci..

[47]  Joanna Puławska,et al.  The Ecology of Agrobacterium vitis and Management of Crown Gall Disease in Vineyards. , 2018, Current topics in microbiology and immunology.

[48]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[49]  Na-Na Guan,et al.  Predicting miRNA‐disease association based on inductive matrix completion , 2018, Bioinform..

[50]  Q. Zou,et al.  Deep learning in omics: a survey and guideline , 2018, Briefings in functional genomics.

[51]  Geoffrey I. Webb,et al.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..

[52]  Ross M. Graham,et al.  Bordetella Pertussis virulence factors in the continuing evolution of whooping cough vaccines for improved performance , 2018, Medical Microbiology and Immunology.

[53]  Bo Liao,et al.  A Hybrid Deep Learning Model for Predicting Protein Hydroxylation Sites , 2018, International journal of molecular sciences.

[54]  Vince D. Calhoun,et al.  Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: Evidence from whole-brain resting-state functional connectivity patterns of schizophrenia , 2016, NeuroImage.

[55]  Cong Zeng,et al.  An account of in silico identification tools of secreted effector proteins in bacteria and future challenges , 2019, Briefings Bioinform..

[56]  Saurabh Pandey,et al.  Interaction of Mycobacterium tuberculosis Virulence Factor RipA with Chaperone MoxR1 Is Required for Transport through the TAT Secretion System , 2016, mBio.

[57]  Guohua Huang,et al.  The Advances and Challenges of Deep Learning Application in Biological Big Data Processing , 2017, Current Bioinformatics.

[58]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[59]  Yan Li,et al.  Bartonella quintana type IV secretion effector BepE‐induced selective autophagy by conjugation with K63 polyubiquitin chain , 2018, Cellular microbiology.

[60]  Hong Zhang,et al.  Facial expression recognition via learning deep sparse autoencoders , 2018, Neurocomputing.

[61]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[62]  Tal Pupko,et al.  Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach , 2009, PLoS pathogens.

[63]  Sukanta Mondal,et al.  Ensemble Architecture for Prediction of Enzyme‐ligand Binding Residues Using Evolutionary Information , 2017, Molecular informatics.

[64]  Juan Antonio Álvarez,et al.  Deep neural network for traffic sign recognition systems: An analysis of spatial transformers and stochastic optimisation methods , 2018, Neural Networks.

[65]  Rui Fa,et al.  Predicting human protein function with multi-task deep neural networks , 2018, bioRxiv.