An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach

Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins.

[1]  Shira L. Broschat,et al.  Determining Optimal Features for Predicting Type IV Secretion System Effector Proteins for Coxiella burnetii , 2017, BCB.

[2]  Na Han,et al.  T4SP Database 2.0: An Improved Database for Type IV Secretion Systems in Bacterial Genomes with New Online Analysis Tools , 2016, Comput. Math. Methods Medicine.

[3]  Tal Pupko,et al.  Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires , 2016, Nature Genetics.

[4]  M. Touchon,et al.  Identification of protein secretion systems in bacterial genomes , 2015, bioRxiv.

[5]  Yufei Wang,et al.  Type IV secretion system of Brucella spp. and its effectors , 2015, Front. Cell. Infect. Microbiol..

[6]  J. Dumler,et al.  Bioinformatic and mass spectrometry identification of Anaplasma phagocytophilum proteins translocated into host cell nuclei , 2015, Front. Microbiol..

[7]  D. Burstein,et al.  Identification of Novel Coxiella burnetii Icm/Dot Effectors and Genetic Analysis of Their Involvement in Modulating a Mitogen-Activated Protein Kinase Pathway , 2014, Infection and Immunity.

[8]  Yejun Wang,et al.  Prediction of bacterial type IV secreted effectors by C-terminal features , 2014, BMC Genomics.

[9]  Lingyun Zou,et al.  Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles , 2013, Bioinform..

[10]  Emmanuel Albina,et al.  Searching algorithm for type IV secretion system effectors 1.0: a tool for predicting type IV effectors and exploring their genomic context , 2013, Nucleic acids research.

[11]  Robert Child,et al.  Brucella Modulates Secretory Trafficking via Multiple Type IV Secretion Effector Proteins , 2013, PLoS pathogens.

[12]  Daniel E. Voth,et al.  Bacterial Type IV secretion systems: versatile virulence machines. , 2012, Future microbiology.

[13]  Shira L. Broschat,et al.  Identification of Anaplasma marginale Type IV Secretion System Effector Proteins , 2011, PloS one.

[14]  D. St Johnston,et al.  Supplementary Figure 5 , 2009 .

[15]  R. Heinzen,et al.  The Coxiella burnetii Cryptic Plasmid Is Enriched in Genes Encoding Type IV Secretion System Substrates , 2011, Journal of bacteriology.

[16]  Zhao-Qing Luo,et al.  Large-scale identification and translocation of type IV secretion substrates by Coxiella burnetii , 2010, Proceedings of the National Academy of Sciences.

[17]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[18]  M. Šantić,et al.  Exploitation of conserved eukaryotic host cell farnesylation machinery by an F-box effector of Legionella pneumophila , 2010, The Journal of experimental medicine.

[19]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[20]  Tal Pupko,et al.  Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach , 2009, PLoS pathogens.

[21]  R. Heinzen,et al.  The Coxiella burnetii Ankyrin Repeat Domain-Containing Protein Family Is Heterogeneous, with C-Terminal Truncations That Influence Dot/Icm-Mediated Secretion , 2009, Journal of bacteriology.

[22]  C. Dehio,et al.  Bartonella henselae: subversion of vascular endothelial cell functions by translocated bacterial effector proteins. , 2009, The international journal of biochemistry & cell biology.

[23]  M. Heidtman,et al.  Large‐scale identification of Legionella pneumophila Dot/Icm substrates that modulate host cell vesicle trafficking pathways , 2009, Cellular microbiology.

[24]  Jeff A. Bilmes,et al.  Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks , 2008, PLoS Comput. Biol..

[25]  T. Zusman,et al.  Identification of a Hypervariable Region Containing New Legionella pneumophila Icm/Dot Translocated Substrates by Using the Conserved icmQ Regulatory Signature , 2008, Infection and Immunity.

[26]  O. Anderson,et al.  Legionella Eukaryotic-Like Type IV Substrates Interfere with Organelle Trafficking , 2008, PLoS pathogens.

[27]  C. Roy,et al.  Ankyrin Repeat Proteins Comprise a Diverse Family of Bacterial Type IV Effectors , 2008, Science.

[28]  J. Graham,et al.  Role for the Ankyrin eukaryotic-like genes of Legionella pneumophila in parasitism of protozoan hosts and human macrophages. , 2008, Environmental microbiology.

[29]  Hiroki Nagai,et al.  Legionella translocates an E3 ubiquitin ligase that has multiple U‐boxes with distinct functions , 2008, Molecular microbiology.

[30]  G. Segal,et al.  The Response Regulator CpxR Directly Regulates Expression of Several Legionella pneumophila icm/dot Components as Well as New Translocated Substrates , 2008, Journal of bacteriology.

[31]  T. Zusman,et al.  The response regulator PmrA is a major regulator of the icm/dot type IV secretion system in Legionella pneumophila and Coxiella burnetii , 2007, Molecular microbiology.

[32]  R. Isberg,et al.  A Legionella pneumophila-translocated substrate that is required for growth within macrophages and protection from host cell death , 2006, Proceedings of the National Academy of Sciences.

[33]  D. Toomre,et al.  The Legionella pneumophila effector protein DrrA is a Rab1 guanine nucleotide-exchange factor , 2006, Nature Cell Biology.

[34]  C. Buchrieser,et al.  Adaptation of Legionella pneumophila to the host environment: role of protein secretion, effectors and eukaryotic-like proteins. , 2006, Current opinion in microbiology.

[35]  C. Pericone,et al.  Evidence for Acquisition of Legionella Type IV Secretion Substrates via Interdomain Horizontal Gene Transfer , 2005, Journal of bacteriology.

[36]  C. Roy,et al.  A yeast genetic system for the identification and characterization of substrate proteins transferred into host cells by the Legionella pneumophila Dot/Icm system , 2005, Molecular microbiology.

[37]  J Patrick Bardill,et al.  IcmS‐dependent translocation of SdeA into macrophages by the Legionella pneumophila type IV secretion system , 2005, Molecular microbiology.

[38]  S. Emr,et al.  Pathogen effector protein screening in yeast identifies Legionella factors that interfere with membrane trafficking. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Hiroki Nagai,et al.  A C-terminal translocation signal required for Dot/Icm-dependent delivery of the Legionella RalF protein to host cells. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  E. D. Cambronne,et al.  The Legionella IcmS–IcmW protein complex is important for Dot/Icm‐mediated protein translocation , 2004, Molecular microbiology.

[41]  C. Buchrieser,et al.  Evidence in the Legionella pneumophila genome for exploitation of host cell functions and high genome plasticity , 2004, Nature Genetics.

[42]  I. Chou,et al.  The Genomic Sequence of the Accidental Pathogen Legionella pneumophila , 2004, Science.

[43]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[44]  O. Anderson,et al.  Legionella Effectors That Promote Nonlytic Release from Protozoa , 2004, Science.

[45]  R. Isberg,et al.  The Legionella pneumophila LidA protein: a translocated substrate of the Dot/Icm system associated with maintenance of bacterial integrity , 2003, Molecular microbiology.

[46]  R. Kahn,et al.  A Bacterial Guanine Nucleotide Exchange Factor Activates ARF on Legionella Phagosomes , 2002, Science.

[47]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[48]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[49]  K. Mendelieff AN INDIAN SNAKE-DANCE. , 1886, Science.