Bastion3: a two-layer ensemble predictor of type III secreted effectors

MOTIVATION Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. RESULTS In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. AVAILABILITY AND IMPLEMENTATION http://bastion3.erc.monash.edu/. CONTACT selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Xiaoqi Zheng,et al.  Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. , 2010, Biochimie.

[2]  Dong-Sheng Cao,et al.  protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences , 2015, Bioinform..

[3]  V. Tam,et al.  The ShcA protein is a molecular chaperone that assists in the secretion of the HopPsyA effector from the type III (Hrp) protein secretion system of Pseudomonas syringae , 2002, Molecular microbiology.

[4]  C. A. Hooker Adaptation in systems: A review essay , 1995 .

[5]  R. Manfredini,et al.  Unravelling the Complexity of Inherited Retinal Dystrophies Molecular Testing: Added Value of Targeted Next-Generation Sequencing , 2016, BioMed research international.

[6]  Quan Zou,et al.  HPSLPred: An Ensemble Multi‐Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source , 2017, Proteomics.

[7]  Wei Chen,et al.  Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines , 2017, Scientific Reports.

[8]  Wei Chen,et al.  PAI: Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions , 2016, Scientific Reports.

[9]  Raghvendra Mall,et al.  PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine , 2018, Bioinform..

[10]  Geoffrey I. Webb,et al.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..

[11]  Jiangning Song,et al.  Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors , 2018, Bioinform..

[12]  Partho Ghosh,et al.  Three-dimensional secretion signals in chaperone-effector complexes of bacterial pathogens. , 2002, Molecular cell.

[13]  Hao Lin,et al.  DNA physical properties outperform sequence compositional information in classifying nucleosome-enriched and -depleted regions. , 2019, Genomics.

[14]  T. Iida,et al.  Interaction between the Type III Effector VopO and GEF-H1 Activates the RhoA-ROCK Pathway , 2015, PLoS pathogens.

[15]  C. E. Stebbins,et al.  A common structural motif in the binding of virulence factors to bacterial secretion chaperones. , 2006, Molecular cell.

[16]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[17]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[18]  Geoffrey I. Webb,et al.  Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches , 2017, Briefings Bioinform..

[19]  Michael J. Patton,et al.  Chlamydial Protease-Like Activity Factor and Type III Secreted Effectors Cooperate in Inhibition of p65 Nuclear Translocation , 2016, mBio.

[20]  C. Lesser,et al.  High-Throughput Screening of Type III Secretion Determinants Reveals a Major Chaperone-Independent Pathway , 2018, mBio.

[21]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[22]  Tao Jiang,et al.  Computational prediction of type III secreted proteins from gram-negative bacteria , 2010, BMC Bioinformatics.

[23]  Bogdan Dorohonceanu,et al.  Accelerating Protein Classification Using Suffix Trees , 2000, ISMB.

[24]  Tie-Yan Liu,et al.  A Communication-Efficient Parallel Algorithm for Decision Tree , 2016, NIPS.

[25]  D. Kihara,et al.  YggG is a Novel SPI-1 Effector Essential for Salmonella Virulence , 2018, bioRxiv.

[26]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[27]  G. Núñez,et al.  Identification and functional characterization of EseH, a new effector of the type III secretion system of Edwardsiella piscicida , 2017, Cellular microbiology.

[28]  Geoffrey I. Webb,et al.  iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites , 2018, Briefings Bioinform..

[29]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[30]  HuangYing,et al.  CD-HIT Suite , 2010 .

[31]  Georgios S. Vernikos,et al.  Salmonella bongori Provides Insights into the Evolution of the Salmonellae , 2011, PLoS pathogens.

[32]  Thomas Rattei,et al.  Sequence-Based Prediction of Type III Secreted Proteins , 2009, PLoS pathogens.

[33]  Yana Bromberg,et al.  Computational prediction shines light on type III secretion origins , 2016, Scientific Reports.

[34]  Guo-Zheng Li,et al.  Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Joaquín Bernal-Bayard,et al.  Salmonella Type III Secretion Effector SlrP Is an E3 Ubiquitin Ligase for Mammalian Thioredoxin* , 2009, The Journal of Biological Chemistry.

[36]  Ram Samudrala,et al.  Accurate Prediction of Secreted Substrates and Identification of a Conserved Putative Secretion Signal for Type III Secretion Systems , 2009, PLoS pathogens.

[37]  M. Karavolos,et al.  Type III Secretion of the Salmonella Effector Protein SopE Is Mediated via an N-Terminal Amino Acid Signal and Not an mRNA Sequence , 2005, Journal of bacteriology.

[38]  Hans Wolf-Watz,et al.  Molecular characterization of type III secretion signals via analysis of synthetic N‐terminal amino acid sequences , 2002, Molecular microbiology.

[39]  Feng Ye,et al.  Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM , 2012, Journal of biomolecular structure & dynamics.

[40]  Qing Zhang,et al.  High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles , 2011, Bioinform..

[41]  Xue-wen Chen,et al.  On Position-Specific Scoring Matrix for Protein Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[43]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[44]  Reza Ebrahimpour,et al.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. , 2013, Genomics.

[45]  K. Legrand,et al.  A novel type 3 secretion system effector, YspI of Yersinia enterocolitica, induces cell paralysis by reducing total focal adhesion kinase , 2015, Cellular microbiology.

[46]  Ziding Zhang,et al.  BEAN 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors , 2015, Database J. Biol. Databases Curation.

[47]  Masaki Iwabuchi,et al.  Genome-wide identification of a large repertoire of Ralstonia solanacearum type III effector proteins by a new functional screen. , 2010, Molecular plant-microbe interactions : MPMI.

[48]  Ziding Zhang,et al.  Using Weakly Conserved Motifs Hidden in Secretion Signals to Identify Type-III Effectors from Bacterial Pathogen Genomes , 2013, PloS one.

[49]  S. Opiyo,et al.  Application of alignment-free bioinformatics methods to identify an oomycete protein with structural and functional similarity to the bacterial AvrE effector protein , 2018, PloS one.

[50]  Bingsheng He,et al.  Efficient Gradient Boosted Decision Tree Training on GPUs , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[51]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[52]  C. Southward,et al.  Identification of VPA0451 as the specific chaperone for the Vibrio parahaemolyticus chromosome 1 type III-secreted effector VPA0450. , 2014, FEMS microbiology letters.

[53]  X. Jiao,et al.  Construction and characterization of a cigR deletion mutant of Salmonella enterica serovar Pullorum , 2016, Avian pathology : journal of the W.V.P.A.

[54]  D. Hume,et al.  Effector ExoU from the Type III Secretion System Is an Important Modulator of Gene Expression in Lung Epithelial Cells in Response to Pseudomonas aeruginosa Infection , 2003, Infection and Immunity.

[55]  Fan Zhang,et al.  T3SEdb: data warehousing of virulence effectors secreted by the bacterial Type III Secretion System , 2010, BMC Bioinformatics.

[56]  Fred Heffron,et al.  A multi-pronged search for a common structural motif in the secretion signal of Salmonella enterica serovar Typhimurium type III effector proteins. , 2010, Molecular bioSystems.

[57]  H. Yoshikawa,et al.  A Putative Type III Secretion System Effector Encoded by the MA20_12780 Gene in Bradyrhizobium japonicum Is-34 Causes Incompatibility with Rj 4 Genotype Soybeans , 2015, Applied and Environmental Microbiology.

[58]  F. White,et al.  Molecular characterization of XopAG effector AvrGf2 from Xanthomonas fuscans ssp. aurantifolii in grapefruit. , 2017, Molecular plant pathology.

[59]  K. A. Fields,et al.  Application of β-Lactamase Reporter Fusions as an Indicator of Effector Protein Secretion during Infections with the Obligate Intracellular Pathogen Chlamydia trachomatis , 2015, PloS one.

[60]  Luca Scrucca,et al.  GA: A Package for Genetic Algorithms in R , 2013 .

[61]  HighWire Press,et al.  Molecular & cellular proteomics , 2002 .

[62]  Menglong Li,et al.  Effective Identification of Gram-Negative Bacterial Type III Secreted Effectors Using Position-Specific Residue Conservation Profiles , 2013, PloS one.

[63]  Geoffrey I. Webb,et al.  SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systems , 2017, Scientific Reports.

[64]  Rodrigo Lopez,et al.  The EMBL-EBI bioinformatics web and programmatic tools framework , 2015, Nucleic Acids Res..

[65]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[66]  Yejun Wang,et al.  T3_MM: A Markov Model Effectively Classifies Bacterial Type III Secretion Signals , 2013, PloS one.

[67]  Joanne M Stevens,et al.  Type III Secretion in the Melioidosis Pathogen Burkholderia pseudomallei , 2017, Front. Cell. Infect. Microbiol..

[68]  C. Henry,et al.  Comparative Secretome Analysis of Ralstonia solanacearum Type 3 Secretion-Associated Mutants Reveals a Fine Control of Effector Delivery, Essential for Bacterial Pathogenicity* , 2015, Molecular & Cellular Proteomics.

[69]  T. Pupko,et al.  Revealing the inventory of type III effectors in Pantoea agglomerans gall-forming pathovars using draft genome sequences and a machine-learning approach. , 2018, Molecular plant pathology.

[70]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[71]  L. Foster,et al.  EseG, an Effector of the Type III Secretion System of Edwardsiella tarda, Triggers Microtubule Destabilization , 2010, Infection and Immunity.

[72]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[73]  Lingyun Zou,et al.  Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles , 2013, Bioinform..

[74]  Dan Li,et al.  Prediction of luciferase inhibitors by the high-performance MIEC-GBDT approach based on interaction energetic patterns. , 2017, Physical chemistry chemical physics : PCCP.

[75]  M. Anisimova,et al.  Repertoire, unified nomenclature and evolution of the Type III effector gene set in the Ralstonia solanacearum species complex , 2013, BMC Genomics.

[76]  G. Waksman,et al.  Protein-Injection Machines in Bacteria , 2018, Cell.

[77]  A. Abe,et al.  The Bordetella Secreted Regulator BspR Is Translocated into the Nucleus of Host Cells via Its N-Terminal Moiety: Evaluation of Bacterial Effector Translocation by the Escherichia coli Type III Secretion System , 2015, PloS one.

[78]  Yong Huang,et al.  In Silico Prediction of Gamma-Aminobutyric Acid Type-A Receptors Using Novel Machine-Learning-Based SVM and GBDT Approaches , 2016, BioMed research international.

[79]  Geoffrey I. Webb,et al.  Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI , 2016, Briefings Bioinform..

[80]  Gholamreza Haffari,et al.  PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. , 2018, Journal of theoretical biology.

[81]  Samuel Wagner,et al.  Bacterial type III secretion systems: specialized nanomachines for protein delivery into target cells. , 2014, Annual review of microbiology.

[82]  Ying Ju,et al.  Improving tRNAscan‐SE Annotation Results via Ensemble Classifiers , 2015, Molecular informatics.

[83]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[84]  B. Raymond,et al.  Subversion of trafficking, apoptosis, and innate immunity by type III secretion system effectors. , 2013, Trends in microbiology.

[85]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[86]  M. Norman,et al.  Yersinia YopE is targeted for type III secretion by N‐terminal, not mRNA, signals , 2001, Molecular microbiology.

[87]  Tal Pupko,et al.  Identification of novel Xanthomonas euvesicatoria type III effector proteins by a machine-learning approach. , 2016, Molecular plant pathology.

[88]  Peer Bork,et al.  Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees , 2016, Nucleic Acids Res..

[89]  Robin C. Friedman,et al.  Identification of novel substrates of Shigella T3SA through analysis of its virulence plasmid-encoded secretome , 2017, PloS one.

[90]  Gisbert Schneider,et al.  Prediction of Type III Secretion Signals in Genomes of Gram-Negative Bacteria , 2009, PloS one.

[91]  Cheng-Hong Yang,et al.  Changes in Serum Concentrations of Fibroblast Growth Factor 23 and Soluble Klotho in Hemodialysis Patients after Total Parathyroidectomy , 2016, BioMed research international.

[92]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[93]  Yejun Wang,et al.  Effective Identification of Bacterial Type III Secretion Signals Using Joint Element Features , 2013, PloS one.

[94]  Fred Heffron,et al.  Identification of New Secreted Effectors in Salmonella enterica Serovar Typhimurium , 2005, Infection and Immunity.

[95]  Yantao Jia,et al.  A dual role for proline iminopeptidase in the regulation of bacterial motility and host immunity. , 2018, Molecular plant pathology.

[96]  Geoffrey I. Webb,et al.  Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework , 2018, Briefings Bioinform..

[97]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[98]  David W Holden,et al.  Salmonella SPI-2 Type III Secretion System Effectors: Molecular Mechanisms And Physiological Consequences. , 2017, Cell host & microbe.

[99]  B. Finlay,et al.  Assembly, structure, function and regulation of type III secretion systems , 2017, Nature Reviews Microbiology.

[100]  M. Kanehisa Linking databases and organisms: GenomeNet resources in Japan. , 1997, Trends in biochemical sciences.

[101]  Joaquín Bernal-Bayard,et al.  SrfJ, a Salmonella Type III Secretion System Effector Regulated by PhoP, RcsB, and IolR , 2012, Journal of bacteriology.

[102]  Eric Y. T. Juan,et al.  Predicting Protein Subcellular Localizations for Gram-Negative Bacteria Using DP-PSSM and Support Vector Machines , 2009, 2009 International Conference on Complex, Intelligent and Software Intensive Systems.

[103]  R. Thune,et al.  Comparison of Vietnamese and US isolates of Edwardsiella ictaluri. , 2013, Diseases of aquatic organisms.