SCMBYK: prediction and characterization of bacterial tyrosine-kinases based on propensity scores of dipeptides

BackgroundBacterial tyrosine-kinases (BY-kinases), which play an important role in numerous cellular processes, are characterized as a separate class of enzymes and share no structural similarity with their eukaryotic counterparts. However, in silico methods for predicting BY-kinases have not been developed yet. Since these enzymes are involved in key regulatory processes, and are promising targets for anti-bacterial drug design, it is desirable to develop a simple and easily interpretable predictor to gain new insights into bacterial tyrosine phosphorylation. This study proposes a novel SCMBYK method for predicting and characterizing BY-kinases.ResultsA dataset consisting of 797 BY-kinases and 783 non-BY-kinases was established to design the SCMBYK predictor, which achieved training and test accuracies of 97.55 and 96.73%, respectively. Furthermore, the leave-one-phylum-out method was used to predict specific bacterial phyla hosts of target sequences, gaining 97.39% average test accuracy. After analyzing SCMBYK-derived propensity scores, four characteristics of BY-kinases were determined: 1) BY-kinases tend to be composed of α-helices; 2) the amino-acid content of extracellular regions of BY-kinases is expected to be dominated by residues such as Val, Ile, Phe and Tyr; 3) BY-kinases structurally resemble nuclear proteins; 4) different domains play different roles in triggering BY-kinase activity.ConclusionsThe SCMBYK predictor is an effective method for identification of possible BY-kinases. Furthermore, it can be used as a part of a novel drug repurposing method, which recognizes putative BY-kinases and matches them to approved drugs. Among other results, our analysis revealed that azathioprine could suppress the virulence of M. tuberculosis, and thus be considered as a potential antibiotic for tuberculosis treatment.

[1]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[2]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[3]  Ivan Mijakovic,et al.  Bacterial tyrosine kinases: evolution, biological function and structural insights , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[4]  Hui-Ling Huang,et al.  Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences , 2014, PloS one.

[5]  Zongchao Jia,et al.  Structure of Escherichia coli tyrosine kinase Etk reveals a novel activation mechanism , 2008, The EMBO journal.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  A. Nesvizhskii,et al.  Metrics for the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification. , 2015, Journal of proteome research.

[8]  Shinn-Ying Ho,et al.  SCMHBP: prediction and analysis of heme binding proteins using propensity scores of dipeptides , 2014, BMC Bioinformatics.

[9]  Shinn-Ying Ho,et al.  SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method , 2015, BMC Bioinformatics.

[10]  Shinn-Ying Ho,et al.  SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs , 2013, PloS one.

[11]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[12]  Allan Matte,et al.  Sequence-structure relationships in polysaccharide co-polymerase (PCP) proteins. , 2009, Trends in biochemical sciences.

[13]  S. Rackovsky,et al.  Differential geometry and polymer conformation. 4. Conformational and nucleation properties of individual amino acids , 1982 .

[14]  Richard J Lamont,et al.  Tyrosine phosphorylation and bacterial virulence , 2012, International Journal of Oral Science.

[15]  H. Scheraga,et al.  Status of empirical methods for the prediction of protein backbone topography. , 1976, Biochemistry.

[16]  Debmalya Barh,et al.  Exoproteome and Secretome Derived Broad Spectrum Novel Drug and Vaccine Candidates in Vibrio cholerae Targeted by Piper betel Derived Compounds , 2013, PloS one.

[17]  Shinn-Ying Ho,et al.  Intelligent evolutionary algorithms for large parameter optimization problems , 2004, IEEE Transactions on Evolutionary Computation.

[18]  K Nishikawa,et al.  The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins , 1992, FEBS letters.

[19]  Wen-Liang Chen,et al.  Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition , 2012, BMC Bioinformatics.

[20]  G. Deray,et al.  Tuberculosis after conversion from azathioprine to mycophenolate mofetil in a long-term renal transplant recipient. , 2005, Transplantation proceedings.

[21]  N. C. Price,et al.  Biochemical and X‐ray crystallographic studies on shikimate kinase: The important structural role of the P‐loop lysine , 2001, Protein science : a publication of the Protein Society.

[22]  Lei Shi,et al.  BYKdb: the Bacterial protein tYrosine Kinase database , 2011, Nucleic Acids Res..

[23]  Raphaël Terreux,et al.  Bacterial tyrosine-kinases: structure-function analysis and therapeutic potential. , 2010, Biochimica et biophysica acta.

[24]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[25]  Ivan Mijakovic,et al.  Evolution of Bacterial Protein-Tyrosine Kinases and Their Relaxed Specificity Toward Substrates , 2014, Genome biology and evolution.

[26]  Ivan Mijakovic,et al.  Structural Basis for the Regulation Mechanism of the Tyrosine Kinase CapB from Staphylococcus aureus , 2008, PLoS biology.

[27]  Christophe Geourjon,et al.  SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments , 1995, Comput. Appl. Biosci..

[28]  John D. Scott,et al.  Therapeutic strategies for anchored kinases and phosphatases: exploiting short linear motifs and intrinsic disorder , 2015, Front. Pharmacol..

[29]  Vassilis Virvilis,et al.  Literature mining, ontologies and information visualization for drug repurposing , 2011, Briefings Bioinform..

[30]  Andaleeb Sajid,et al.  Protein Phosphatases of Pathogenic Bacteria: Role in Physiology and Virulence. , 2015, Annual review of microbiology.

[31]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[32]  S. Rackovsky,et al.  Differential Geometry and Polymer Conformation. 1. Comparison of Protein Conformations1a,b , 1978 .

[33]  Yanay Ofran,et al.  Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins , 2012, Bioinform..

[34]  C. Whitfield,et al.  Phosphorylation of Wzc, a Tyrosine Autokinase, Is Essential for Assembly of Group 1 Capsular Polysaccharides in Escherichia coli* , 2001, The Journal of Biological Chemistry.

[35]  Ivan Mijakovic,et al.  Tyrosine phosphorylation: an emerging regulatory device of bacterial physiology. , 2007, Trends in biochemical sciences.

[36]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[37]  Alimuddin Zumla,et al.  Tuberculosis--advances in development of new drugs, treatment regimens, host-directed therapies, and biomarkers. , 2016, The Lancet. Infectious diseases.

[38]  Gabriele Ausiello,et al.  Identification of Nucleotide-Binding Sites in Protein Structures: A Novel Approach Based on Nucleotide Modularity , 2012, PloS one.