T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors

Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences. ABSTRACT Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.

[1]  Yongdeng Zhang,et al.  Visualization and characterization of individual type III protein secretion machines in live bacteria , 2017, Proceedings of the National Academy of Sciences.

[2]  Joshua N. Adkins,et al.  Discovery of Novel Secreted Virulence Factors from Salmonella enterica Serovar Typhimurium by Proteomic Analysis of Culture Supernatants , 2010, Infection and Immunity.

[3]  Konstantinos D. Tsirigos,et al.  SignalP 5.0 improves signal peptide predictions using deep neural networks , 2019, Nature Biotechnology.

[4]  Thomas Rattei,et al.  Sequence-Based Prediction of Type III Secreted Proteins , 2009, PLoS pathogens.

[5]  Ryu Okada,et al.  Export of a Vibrio parahaemolyticus toxin by the Sec and type III secretion machineries in tandem , 2019, Nature Microbiology.

[6]  Ole Winther,et al.  DeepLoc: prediction of protein subcellular localization using deep learning , 2017, Bioinform..

[7]  Ram Samudrala,et al.  Computational Prediction of Type III and IV Secreted Effectors in Gram-Negative Bacteria , 2010, Infection and Immunity.

[8]  Andrés Zalguizuri,et al.  Phylogenetic profiling, an untapped resource for the prediction of secreted proteins and its complementation with sequence-based classifiers in bacterial type III, IV and VI secretion systems , 2019, Briefings Bioinform..

[9]  Qing Zhang,et al.  High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles , 2011, Bioinform..

[10]  Ziding Zhang,et al.  BEAN 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors , 2015, Database J. Biol. Databases Curation.

[11]  Geoffrey I. Webb,et al.  Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI , 2016, Briefings Bioinform..

[12]  M. Schmidt,et al.  The species‐spanning family of LPX‐motif harbouring effector proteins , 2018, Cellular microbiology.

[13]  Yejun Wang,et al.  A global survey of bacterial type III secretion systems and their effectors , 2017, Environmental microbiology.

[14]  Yejun Wang,et al.  T3_MM: A Markov Model Effectively Classifies Bacterial Type III Secretion Signals , 2013, PloS one.

[15]  Alan Collmer,et al.  Genomewide identification of proteins secreted by the Hrp type III protein secretion system of Pseudomonas syringae pv. tomato DC3000 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jiangning Song,et al.  Bastion3: a two-layer ensemble predictor of type III secreted effectors , 2018, Bioinform..

[17]  Alan Collmer,et al.  Pseudomonas syringae Lytic Transglycosylases Coregulated with the Type III Secretion System Contribute to the Translocation of Effector Proteins into Plant Cells , 2007, Journal of bacteriology.

[18]  Yejun Wang,et al.  Identification of new bacterial type III secreted effectors with a recursive Hidden Markov Model profile-alignment strategy , 2016, bioRxiv.

[19]  Cong Zeng,et al.  An account of in silico identification tools of secreted effector proteins in bacteria and future challenges , 2019, Briefings Bioinform..

[20]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[21]  U. Urzúa,et al.  Tumor and reproductive traits are linked by RNA metabolism genes in the mouse ovary: a transcriptome-phenotype association analysis , 2010, BMC Genomics.

[22]  C. Hueck,et al.  Type III Protein Secretion Systems in Bacterial Pathogens of Animals and Plants , 1998, Microbiology and Molecular Biology Reviews.

[23]  Ram Samudrala,et al.  Accurate Prediction of Secreted Substrates and Identification of a Conserved Putative Secretion Signal for Type III Secretion Systems , 2009, PLoS pathogens.

[24]  Yueming Hu,et al.  EBT: a statistic test identifying moderate size of significant features with balanced power and precision for genome‐wide rate comparisons , 2017, Bioinform..

[25]  Gisbert Schneider,et al.  Prediction of Type III Secretion Signals in Genomes of Gram-Negative Bacteria , 2009, PloS one.

[26]  David W Holden,et al.  Salmonella SPI-2 Type III Secretion System Effectors: Molecular Mechanisms And Physiological Consequences. , 2017, Cell host & microbe.

[27]  Samuel Wagner,et al.  Bacterial type III secretion systems: specialized nanomachines for protein delivery into target cells. , 2014, Annual review of microbiology.

[28]  Wei Chen,et al.  DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence , 2018, Bioinform..

[29]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[30]  Yana Bromberg,et al.  Computational prediction shines light on type III secretion origins , 2016, Scientific Reports.

[31]  Frank Thieme,et al.  Two Novel Type III-Secreted Proteins of Xanthomonas campestris pv. vesicatoria Are Encoded within the hrp Pathogenicity Island , 2002, Journal of bacteriology.

[32]  C. E. Stebbins,et al.  A common structural motif in the binding of virulence factors to bacterial secretion chaperones. , 2006, Molecular cell.

[33]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[34]  Sheng Yang He,et al.  Identification of novel hrp‐regulated genes through functional genomic analysis of the Pseudomonas syringae pv. tomato DC3000 genome , 2002, Molecular microbiology.

[35]  H. Matsumoto,et al.  Translocated effectors of Yersinia. , 2009, Current opinion in microbiology.

[36]  Tetsuya Hayashi,et al.  An extensive repertoire of type III secretion effectors in Escherichia coli O157 and the role of lambdoid phages in their dissemination , 2006, Proceedings of the National Academy of Sciences.

[37]  N. Grishin,et al.  PROMALS3D: a tool for multiple protein sequence and structure alignments , 2008, Nucleic acids research.

[38]  David S Guttman,et al.  Terminal Reassortment Drives the Quantum Evolution of Type III Effectors in Bacterial Pathogens , 2006, PLoS pathogens.

[39]  T. Mukaihara,et al.  Identification of novel Ralstonia solanacearum type III effector proteins through translocation analysis of hrpB-regulated gene products. , 2009, Microbiology.

[40]  G. Plano,et al.  ExsA and LcrF Recognize Similar Consensus Binding Sites, but Differences in Their Oligomeric State Influence Interactions with Promoter DNA , 2013, Journal of bacteriology.

[41]  K. Orth,et al.  The role of type III secretion System 2 in Vibrio parahaemolyticus pathogenicity , 2012, Journal of Microbiology.

[42]  J. Galán,et al.  Identification of two targets of the type III protein secretion system encoded by the inv and spa loci of Salmonella typhimurium that have homology to the Shigella IpaD and IpaA proteins , 1995, Journal of bacteriology.

[43]  Seema Mattoo,et al.  A genome‐wide screen identifies a Bordetella type III secretion effector and candidate effectors in other species , 2005, Molecular microbiology.

[44]  Dong Xu,et al.  Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif , 2010, BMC Genomics.

[45]  Yoshiharu Sato,et al.  Meta-analytic approach to the accurate prediction of secreted virulence effectors in gram-negative bacteria , 2011, BMC Bioinformatics.

[46]  Yejun Wang,et al.  Effective Identification of Bacterial Type III Secretion Signals Using Joint Element Features , 2013, PloS one.