BlaPred: Predicting and classifying β-lactamase using a 3-tier prediction system via Chou's general PseAAC.

Antibiotics of β-lactam class account for nearly half of the global antibiotic use. The β-lactamase enzyme is a major element of the bacterial arsenals to escape the lethal effect of β-lactam antibiotics. Different variants of β-lactamases have evolved to counter the different types of β-lactam antibiotics. Extensive research has been done to isolate and characterize different variants of β-lactamases. Unfortunately, identification and classification of the β-lactamase enzyme are purely based on experiments, which is both time- and resource-consuming. Thus, there is a need for fast and accurate computational methods to identify and classify new β-lactamase enzymes from the avalanche of sequence data generated in the post-genomic era. Based on these considerations, we have developed a support vector machine based three-tier prediction system, BlaPred, to predict and classify (as per Ambler classification) β-lactamases solely from their protein sequences. The input features used were amino acid composition, classic and amphiphilic pseudo amino acid compositions. The results show that the classic pseudo amino acid composition-based models performed better than the other models. Following a leave-one-out cross-validation procedure, the accuracy to discriminate β-lactamases from non-β-lactamases was 93.57% (tier-I); accuracies for prediction of class A β-lactamases was 93.27%, 95.52% for class B, 96.86% for class C and 97.31% for class D (tier-II); and at tier-III the accuracies for prediction were 84.78%, 95.65% and 89.13% for subclasses B1, B2 and B3, respectively. The comparative results on an independent dataset suggests that our method works efficiently to distinguish β-lactamases from non-β-lactamases, with an overall accuracy of 93.09%, and is further able to classify β-lactamase sequences into their respective Ambler classes and subclasses with accuracy higher than 92% and 87%, respectively. Comparative performance of BlaPred on an independent benchmark dataset also shows a significant improvement over other existing methods. Finally, BlaPred is available as a webserver, as well as standalone software, which can be accessed at http://proteininformatics.org/mkumar/blapred.

[1]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[2]  Manish Kumar,et al.  NRfamPred: A proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families , 2014, Scientific Reports.

[3]  Kuo-Chen Chou,et al.  pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information , 2018, Bioinform..

[4]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[5]  Kuo-Chen Chou,et al.  iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals , 2017, Oncotarget.

[6]  Charles W. Knapp,et al.  Evidence of increasing antibiotic resistance gene abundances in archived soils since 1940. , 2010, Environmental science & technology.

[7]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[8]  Kuo-Chen Chou,et al.  iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition , 2017, Oncotarget.

[9]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[10]  Jugsharan Singh Virdi,et al.  Identification of Family Specific Fingerprints in β-Lactamase Families , 2014, TheScientificWorldJournal.

[11]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[12]  Bin Liu,et al.  Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences , 2017 .

[13]  Kuo-Chen Chou,et al.  iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. , 2017, Bioinformatics.

[14]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[15]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[16]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[17]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[18]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[19]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[20]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[21]  Geoffrey I. Webb,et al.  iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites , 2018, Briefings Bioinform..

[22]  Manish Kumar,et al.  CBMAR: a comprehensive β-lactamase molecular annotation resource , 2014, Database J. Biol. Databases Curation.

[23]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[24]  Manish Kumar,et al.  Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. , 2017, Mitochondrion.

[25]  Gholamreza Haffari,et al.  PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy , 2018, Bioinform..

[26]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[27]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[28]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[29]  Kuo-Chen Chou,et al.  iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. , 2018, Analytical biochemistry.

[30]  K. Chou,et al.  iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC , 2017, Molecular therapy. Nucleic acids.

[31]  Zahoor Jan,et al.  iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition. , 2018, Journal of theoretical biology.

[32]  K. Chou,et al.  iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites , 2018, Molecular therapy. Nucleic acids.

[33]  K. Chou,et al.  iPGK-PseAAC: Identify Lysine Phosphoglycerylation Sites in Proteins by Incorporating Four Different Tiers of Amino Acid Pairwise Coupling Information into the General PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[34]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[35]  Kuo-Chen Chou,et al.  Quat-2L: a web-server for predicting protein quaternary structural attributes , 2011, Molecular Diversity.

[36]  Timothy R. Walsh,et al.  Metallo-β-Lactamases: the Quiet before the Storm? , 2005, Clinical Microbiology Reviews.

[37]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[38]  K. Chou,et al.  iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. , 2018, Genomics.

[39]  Kuo-Chen Chou,et al.  pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. , 2017, Genomics.

[40]  Manish Kumar,et al.  Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine. , 2015, Journal of theoretical biology.

[41]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[42]  Kuo-Chen Chou,et al.  pLoc‐mAnimal: predict subcellular localization of animal proteins with both single and multiple sites , 2017, Bioinform..

[43]  Gajendra P S Raghava,et al.  Prediction of Mitochondrial Proteins Using Support Vector Machine and Hidden Markov Model* , 2006, Journal of Biological Chemistry.

[44]  Samy O Meroueh,et al.  Structural aspects for evolution of beta-lactamases from penicillin-binding proteins. , 2003, Journal of the American Chemical Society.

[45]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[46]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[47]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[48]  Kuo-Chen Chou,et al.  GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. , 2011, Molecular bioSystems.

[49]  R. Ambler,et al.  The structure of beta-lactamases. , 1980, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[50]  Moreno Galleni,et al.  Standard Numbering Scheme for Class B β-Lactamases , 2001, Antimicrobial Agents and Chemotherapy.

[51]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[52]  J. Frère,et al.  Molecular evolution of bacterial β-lactam resistance , 1996 .

[53]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[54]  K. Chou,et al.  pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. , 2018, Genomics.

[55]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[56]  Juan Mei,et al.  Prediction of HIV-1 and HIV-2 proteins by using Chou’s pseudo amino acid compositions and different classifiers , 2018, Scientific Reports.

[57]  S. Muthu Krishnan,et al.  Using Chou's general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. , 2018 .

[58]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[59]  Prabina Kumar Meher,et al.  Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC , 2017, Scientific Reports.

[60]  Kuo-Chen Chou,et al.  2L-PCA: a two-level principal component analyzer for quantitative drug design and its applications , 2017, Oncotarget.

[61]  Kuo-Chen Chou,et al.  pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins , 2017 .

[62]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[63]  Kuo-Chen Chou,et al.  A Novel Modeling in Mathematical Biology for Classification of Signal Peptides , 2018, Scientific Reports.

[64]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[65]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[66]  G. Jacoby,et al.  A functional classification scheme for beta-lactamases and its correlation with molecular structure , 1995, Antimicrobial agents and chemotherapy.

[67]  Jaques Reifman,et al.  Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions , 2002, Bioinform..

[68]  K. Chou,et al.  iNitro-Tyr: Prediction of Nitrotyrosine Sites in Proteins with General Pseudo Amino Acid Composition , 2014, PloS one.

[69]  Kuo-Chen Chou,et al.  NR-2L: A Two-Level Predictor for Identifying Nuclear Receptor Subfamilies Based on Sequence-Derived Features , 2011, PloS one.

[70]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[71]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[72]  Geoffrey I. Webb,et al.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..

[73]  Shengli Zhang,et al.  Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. , 2018, Journal of theoretical biology.

[74]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[75]  Kuo-Chen Chou,et al.  iPreny-PseAAC: Identify C-terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[76]  Shan Lu Proceedings of the 8th Workshop on Programming Languages and Operating Systems , 2015, PLOS@SOSP.

[77]  K. Chou,et al.  Support vector machines for prediction of protein subcellular location by incorporating quasi‐sequence‐order effect , 2002, Journal of cellular biochemistry.

[78]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[79]  K. Chou,et al.  Using LogitBoost classifier to predict protein structural classes. , 2006, Journal of theoretical biology.

[80]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[81]  Juan Mei,et al.  Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou's general pseudo amino acid composition and motif features. , 2018, Journal of theoretical biology.

[82]  Kuo-Chen Chou,et al.  pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. , 2017, Gene.

[83]  E. Abraham,et al.  An Enzyme from Bacteria able to Destroy Penicillin , 1940, Nature.

[84]  Kuo-Chen Chou,et al.  iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals , 2017, Oncotarget.

[85]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[86]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[87]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[88]  Kuo-Chen Chou,et al.  pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. , 2017, Molecular bioSystems.