Sequence-Based Prediction of Type III Secreted Proteins

The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will facilitate further studies on and improve our understanding of type III secretion and its role in pathogen–host interactions.

[1]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[2]  R. W. Davis,et al.  Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. , 1998, Science.

[3]  L. M. Schechter,et al.  Multiple approaches to a complete inventory of Pseudomonas syringae pv. tomato DC3000 type III secretion system effector proteins. , 2006, Molecular plant-microbe interactions : MPMI.

[4]  F. Tian,et al.  Pseudomonas syringae Type III Chaperones ShcO1, ShcS1, and ShcS2 Facilitate Translocation of Their Cognate Effectors and Can Substitute for Each Other in the Secretion of HopO1-1 , 2005, Journal of bacteriology.

[5]  Anders Krogh,et al.  Large-scale prokaryotic gene prediction and comparison to genome annotation , 2005, Bioinform..

[6]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[7]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[8]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[9]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[10]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[11]  B. Finlay,et al.  Bacterial invasion: Force feeding by Salmonella , 1999, Current Biology.

[12]  M. Hensel,et al.  Protein secretion systems and adhesins: the molecular armory of Gram-negative pathogens. , 2007, International journal of medical microbiology : IJMM.

[13]  B. Stecher,et al.  Analyses of the Evolutionary Distribution of Salmonella Translocated Effectors , 2002, Infection and Immunity.

[14]  Alan Collmer,et al.  Pseudomonas syringae Type III Secretion System Targeting Signals and Novel Effectors Studied with a Cya Translocation Reporter , 2004, Journal of bacteriology.

[15]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[16]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[17]  B. Vinatzer,et al.  Bioinformatics correctly identifies many type III secretion substrates in the plant pathogen Pseudomonas syringae and the biocontrol isolate P. fluorescens SBW25. , 2005, Molecular plant-microbe interactions : MPMI.

[18]  G. Cornelis,et al.  Type III secretion: the bacteria-eukaryotic cell express. , 2005, FEMS microbiology letters.

[19]  Ying Zheng,et al.  Caspase-1 Activation in Macrophages Infected with Yersinia pestis KIM Requires the Type III Secretion System Effector YopJ , 2008, Infection and Immunity.

[20]  A. Dautry‐Varsat,et al.  A directed screen for chlamydial proteins secreted by a type III mechanism identifies a translocated protein and numerous other new candidates , 2005, Molecular microbiology.

[21]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[22]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[23]  David P. Wilson,et al.  Type III secretion à la Chlamydia. , 2007, Trends in microbiology.

[24]  M. Karavolos,et al.  Type III Secretion of the Salmonella Effector Protein SopE Is Mediated via an N-Terminal Amino Acid Signal and Not an mRNA Sequence , 2005, Journal of bacteriology.

[25]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[26]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[27]  Monica Vencato,et al.  Bioinformatics-enabled identification of the HrpL regulon and type III secretion system effector proteins of Pseudomonas syringae pv. phaseolicola 1448A. , 2006, Molecular plant-microbe interactions : MPMI.

[28]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[29]  Seema Mattoo,et al.  A genome‐wide screen identifies a Bordetella type III secretion effector and candidate effectors in other species , 2005, Molecular microbiology.

[30]  Thomas Rattei,et al.  SIMAP—structuring the network of protein similarities , 2007, Nucleic Acids Res..

[31]  O. Schneewind,et al.  Yersinia enterocolitica type III secretion: an mRNA signal that couples translation and secretion of YopQ , 1999, Molecular microbiology.

[32]  Evan D. Brutinel,et al.  Control of gene expression by type III secretory activity. , 2008, Current opinion in microbiology.

[33]  S Falkow,et al.  The Salmonella invasin SipB induces macrophage apoptosis by binding to caspase-1. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. Kaper,et al.  The N‐terminus of enteropathogenic Escherichia coli (EPEC) Tir mediates transport across bacterial and eukaryotic cell membranes , 2002, Molecular microbiology.

[35]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[36]  K. Ramamurthi,et al.  Substrate recognition by the Yersinia type III protein secretion machinery , 2003, Molecular microbiology.

[37]  Tetsuya Hayashi,et al.  An extensive repertoire of type III secretion effectors in Escherichia coli O157 and the role of lambdoid phages in their dissemination , 2006, Proceedings of the National Academy of Sciences.

[38]  Dmitrij Frishman,et al.  PROMPT: a protein mapping and comparison tool , 2006, BMC Bioinformatics.

[39]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[40]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[41]  Kumaran S Ramamurthi,et al.  Yersinia yopQ mRNA encodes a bipartite type III secretion signal in the first 15 codons , 2003, Molecular microbiology.

[42]  G. Martin,et al.  Comparative Genomics of Host-Specific Virulence in Pseudomonas syringae , 2006, Genetics.

[43]  David T. Jones,et al.  Improving the accuracy of transmembrane protein topology prediction using evolutionary information , 2007, Bioinform..

[44]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[45]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[46]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[47]  David S Guttman,et al.  A functional screen for the type III (Hrp) secretome of the plant pathogen Pseudomonas syringae. , 2002, Science.

[48]  N. Strynadka,et al.  Piecing together the type III injectisome of bacterial pathogens. , 2008, Current opinion in structural biology.

[49]  K. Hughes,et al.  Type III secretion: a secretory pathway serving both motility and virulence (Review) , 2005, Molecular membrane biology.

[50]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[51]  G. Cornelis,et al.  The bacterial injection kit: Type III secretion systems , 2005, Annals of medicine.

[52]  K. Ramamurthi,et al.  Yersinia enterocolitica Type III Secretion: Mutational Analysis of the yopQ Secretion Signal , 2002, Journal of bacteriology.

[53]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[54]  David S Guttman,et al.  Terminal Reassortment Drives the Quantum Evolution of Type III Effectors in Bacterial Pathogens , 2006, PLoS pathogens.

[55]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[56]  Alan Collmer,et al.  Genomewide identification of proteins secreted by the Hrp type III protein secretion system of Pseudomonas syringae pv. tomato DC3000 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Tomoko Kubori,et al.  Molecular and functional analysis of the type III secretion signal of the Salmonella enterica InvJ protein , 2002, Molecular microbiology.

[58]  N. Moran,et al.  Evolutionary Origins of Genomic Repertoires in Bacteria , 2005, PLoS biology.

[59]  K. Rajakumar,et al.  Distribution and structural variation of the she pathogenicity island in enteric bacterial pathogens. , 2001, Journal of medical microbiology.