Determining Optimal Features for Predicting Type IV Secretion System Effector Proteins for Coxiella burnetii

Type IV secretion systems (T4SS) are constructed from multiple protein complexes that exist in some types of bacterial pathogens and are responsible for delivering type IV effector proteins into host cells. Effectors target eukaryotic cells and try to manipulate host cell processes and the immune system of the host. Some work has been done to validate effectors experimentally, and recently a few scoring and machine learning-based methods have been developed to predict effectors from whole genome sequences. However, different types of features have been suggested to be effective. In this work, we gathered the features proposed in pre-vious reports and calculated their values for a dataset of effectors and non-effectors of Coxiella burnetii. Then we ranked the features based on their importance in classifying effectors and non-effectors to determine the set of optimal features. Finally, a Support Vector Machine model was developed to test the optimal features by comparing them to a set of features proposed in a previous study. The outcome of the comparison supports the effectiveness of our optimal features.

[1]  Tal Pupko,et al.  Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires , 2016, Nature Genetics.

[2]  Lingyun Zou,et al.  Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles , 2013, Bioinform..

[3]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[4]  D. St Johnston,et al.  Supplementary Figure 5 , 2009 .

[5]  Emmanuel Albina,et al.  Searching algorithm for type IV secretion system effectors 1.0: a tool for predicting type IV effectors and exploring their genomic context , 2013, Nucleic acids research.

[6]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[7]  R. Heinzen,et al.  The Coxiella burnetii Cryptic Plasmid Is Enriched in Genes Encoding Type IV Secretion System Substrates , 2011, Journal of bacteriology.

[8]  M. Šantić,et al.  Exploitation of conserved eukaryotic host cell farnesylation machinery by an F-box effector of Legionella pneumophila , 2010, The Journal of experimental medicine.

[9]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[10]  Daniel E. Voth,et al.  Bacterial Type IV secretion systems: versatile virulence machines. , 2012, Future microbiology.

[11]  D. Burstein,et al.  Identification of Novel Coxiella burnetii Icm/Dot Effectors and Genetic Analysis of Their Involvement in Modulating a Mitogen-Activated Protein Kinase Pathway , 2014, Infection and Immunity.

[12]  Jeff A. Bilmes,et al.  Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks , 2008, PLoS Comput. Biol..

[13]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[14]  Shira L. Broschat,et al.  Identification of Anaplasma marginale Type IV Secretion System Effector Proteins , 2011, PloS one.

[15]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[16]  R. Heinzen,et al.  The Coxiella burnetii Ankyrin Repeat Domain-Containing Protein Family Is Heterogeneous, with C-Terminal Truncations That Influence Dot/Icm-Mediated Secretion , 2009, Journal of bacteriology.

[17]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[18]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[19]  C. Roy,et al.  Ankyrin Repeat Proteins Comprise a Diverse Family of Bacterial Type IV Effectors , 2008, Science.

[20]  Yejun Wang,et al.  Prediction of bacterial type IV secreted effectors by C-terminal features , 2014, BMC Genomics.

[21]  Tal Pupko,et al.  Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach , 2009, PLoS pathogens.

[22]  Zhao-Qing Luo,et al.  Large-scale identification and translocation of type IV secretion substrates by Coxiella burnetii , 2010, Proceedings of the National Academy of Sciences.