PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method

Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.

[1]  Virapong Prachayasittikul,et al.  Navigating the chemical space of dipeptidyl peptidase-4 inhibitors , 2015, Drug design, development and therapy.

[2]  Manuel Fuentes,et al.  Screening Phage-Display Antibody Libraries Using Protein Arrays. , 2018, Methods in molecular biology.

[3]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[4]  Virapong Prachayasittikul,et al.  HemoPred: a web server for predicting the hemolytic activity of peptides. , 2017, Future medicinal chemistry.

[5]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[6]  Ola Spjuth,et al.  Towards Predicting the Cytochrome P450 Modulation: From QSAR to Proteochemometric Modeling. , 2017, Current drug metabolism.

[7]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[8]  Hiroyuki Kurata,et al.  PreAIP: Computational Prediction of Anti-inflammatory Peptides by Integrating Multiple Complementary Features , 2019, Front. Genet..

[9]  Gwang Lee,et al.  PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine , 2018, Front. Microbiol..

[10]  Apilak Worachartcheewan,et al.  On the Origins of Hepatitis C Virus NS5B Polymerase Inhibitory Activity Using Machine Learning Approaches. , 2015, Current topics in medicinal chemistry.

[11]  Virapong Prachayasittikul,et al.  TargetAntiAngio: A Sequence-Based Tool for the Prediction and Analysis of Anti-Angiogenic Peptides , 2019, International journal of molecular sciences.

[12]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[13]  H. Ackermann,et al.  Bacteriophage taxonomy in 1987. , 1987, Microbiological sciences.

[14]  Jeerayut Chaijaruwanich,et al.  Prediction of the disulphide bonding state of cysteines in proteins using Conditional Random Fields , 2011, Int. J. Data Min. Bioinform..

[15]  Nalini Schaduangrat,et al.  THPep: A machine learning-based approach for predicting tumor homing peptides , 2019, Comput. Biol. Chem..

[16]  Chanin Nantasenamat,et al.  PepBio: predicting the bioactivity of host defense peptides , 2017 .

[17]  Virapong Prachayasittikul,et al.  Prediction of aromatase inhibitory activity using the efficient linear method (ELM) , 2015, EXCLI journal.

[18]  Shinn-Ying Ho,et al.  SCMHBP: prediction and analysis of heme binding proteins using propensity scores of dipeptides , 2014, BMC Bioinformatics.

[19]  Virapong Prachayasittikul,et al.  osFP: a web server for predicting the oligomeric states of fluorescent proteins , 2016, Journal of Cheminformatics.

[20]  Sachdev S Sidhu,et al.  A minimized M13 coat protein defines the requirements for assembly into the bacteriophage particle. , 2002, Journal of molecular biology.

[21]  Chanin Nantasenamat,et al.  Towards understanding aromatase inhibitory activity via QSAR modeling , 2018, EXCLI journal.

[22]  Hiroyuki Kurata,et al.  Large-Scale Assessment of Bioinformatics Tools for Lysine Succinylation Sites , 2019, Cells.

[23]  Wei Chen,et al.  Naïve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins , 2013, Comput. Math. Methods Medicine.

[24]  Apilak Worachartcheewan,et al.  Probing the origin of estrogen receptor alpha inhibition via large-scale QSAR study , 2018, RSC advances.

[25]  Fu-Ying Dao,et al.  Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods , 2018, Molecules.

[26]  Shinn-Ying Ho,et al.  SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method , 2015, BMC Bioinformatics.

[27]  C. Teschke,et al.  Of capsid structure and stability: The partnership between charged residues of E-loop and P-domain of the bacteriophage P22 coat protein. , 2019, Virology.

[28]  Virapong Prachayasittikul,et al.  PAAP: a web server for predicting antihypertensive activity of peptides. , 2018, Future medicinal chemistry.

[29]  Rob Lavigne,et al.  Phage proteomics: applications of mass spectrometry. , 2009, Methods in molecular biology.

[30]  M. Levitt,et al.  Structure-based conformational preferences of amino acids. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Virapong Prachayasittikul,et al.  Meta-iAVP: A Sequence-Based Meta-Predictor for Improving the Prediction of Antiviral Peptides Using Effective Feature Representation , 2019, International journal of molecular sciences.

[32]  Dong-Sheng Cao,et al.  protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences , 2015, Bioinform..

[33]  J. Lin,et al.  Amino acid analysis of the coat protein of the filamentous bacterial virus xf from Xanthomonas oryzae. , 1971, Virology.

[34]  Nalini Schaduangrat,et al.  iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou’s 5-Steps Rule and Informative Physicochemical Properties , 2019, International journal of molecular sciences.

[35]  Runtao Yang,et al.  An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics , 2015, International journal of molecular sciences.

[36]  Zhen Liu,et al.  Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree , 2018, International journal of molecular sciences.

[37]  K. Chou Advance in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs. , 2019, Current medicinal chemistry.

[38]  Kuo-Chen Chou,et al.  pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. , 2017, Gene.

[39]  R. Wolfenden,et al.  Water, protein folding, and the genetic code. , 1979, Science.

[40]  Muhammad Arif,et al.  Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. , 2020, Genomics.

[41]  Virapong Prachayasittikul,et al.  ACPred: A Computational Tool for the Prediction and Analysis of Anticancer Peptides , 2019, Molecules.

[42]  C. Nantasenamat,et al.  Computer-Aided Drug Design of Bioactive Natural Products. , 2015, Current topics in medicinal chemistry.

[43]  José Luis Balcázar,et al.  Exploring the contribution of bacteriophages to antibiotic resistance. , 2017, Environmental pollution.

[44]  Virapong Prachayasittikul,et al.  Extending proteochemometric modeling for unraveling the sorption behavior of compound-soil interaction , 2016 .

[45]  K. Chou Progresses in Predicting Post-translational Modification , 2019, International Journal of Peptide Research and Therapeutics.

[46]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[47]  Hui-Ling Huang,et al.  Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences , 2014, PloS one.

[48]  Hiroyuki Kurata,et al.  Prediction of S-nitrosylation sites by integrating support vector machines and random forest. , 2019, Molecular omics.

[49]  Kuo-Chen Chou,et al.  pLoc‐mAnimal: predict subcellular localization of animal proteins with both single and multiple sites , 2017, Bioinform..

[50]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[51]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[52]  Sylvain Moineau,et al.  Revenge of the phages: defeating bacterial defences , 2013, Nature Reviews Microbiology.

[53]  Yihui Yuan,et al.  Proteomic Analysis of a Novel Bacillus Jumbo Phage Revealing Glycoside Hydrolase As Structural Component , 2016, Front. Microbiol..

[54]  Jason R. Clark,et al.  Bacteriophages and biotechnology: vaccines, gene therapy and antibacterials. , 2006, Trends in biotechnology.

[55]  Kuo-Chen Chou,et al.  pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. , 2017, Molecular bioSystems.

[56]  Shinn-Ying Ho,et al.  SCMBYK: prediction and characterization of bacterial tyrosine-kinases based on propensity scores of dipeptides , 2016, BMC Bioinformatics.

[57]  Jeff Lyon,et al.  Phage Therapy's Role in Combating Antibiotic-Resistant Pathogens. , 2017, JAMA.

[58]  Leyi Wei,et al.  mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation , 2018, Bioinform..

[59]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[60]  Hiroyuki Kurata,et al.  A Comprehensive Review of In silico Analysis for Protein S-sulfenylation Sites. , 2018, Protein and peptide letters.

[61]  A M Eroshkin,et al.  Mutations in fd phage major coat protein modulate affinity of the displayed peptide. , 2009, Protein engineering, design & selection : PEDS.

[62]  Chunyu Wang,et al.  Identification of Phage Viral Proteins With Hybrid Sequence Features , 2019, Front. Microbiol..

[63]  Chanin Nantasenamat,et al.  HIVCoR: A sequence-based tool for predicting HIV-1 CRF01_AE coreceptor usage , 2019, Comput. Biol. Chem..

[64]  Virapong Prachayasittikul,et al.  CryoProtect: A Web Server for Classifying Antifreeze Proteins from Nonantifreeze Proteins , 2017 .

[65]  Kuo-Chen Chou,et al.  iATC‐mISF: a multi‐label classifier for predicting the classes of anatomical therapeutic chemicals , 2016, Bioinform..

[66]  Shinn-Ying Ho,et al.  SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides , 2015, BMC Genomics.

[67]  Myeong Ok Kim,et al.  PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions , 2018, Front. Immunol..

[68]  Kitsana Waiyamai,et al.  Prediction of human leukocyte antigen gene using k-nearest neighbour classifier based on spectrum kernel , 2013 .

[69]  Victor Seguritan,et al.  Artificial Neural Networks Trained to Detect Viral and Phage Structural Proteins , 2012, PLoS Comput. Biol..

[70]  Shinn-Ying Ho,et al.  SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs , 2013, PloS one.

[71]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[72]  Kuo-Chen Chou,et al.  iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals , 2017, Oncotarget.

[73]  C. Pace,et al.  A helix propensity scale based on experimental studies of peptides and proteins. , 1998, Biophysical journal.

[74]  Myeong Ok Kim,et al.  iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction , 2018, Front. Immunol..

[75]  Leyi Wei,et al.  Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation , 2019, Molecular therapy. Nucleic acids.

[76]  Wei Chen,et al.  Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. , 2014, Molecular bioSystems.

[77]  Jeerayut Chaijaruwanich,et al.  HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees , 2012, Comput. Biol. Medicine.

[78]  Chanin Nantasenamat,et al.  Unraveling the bioactivity of anticancer peptides as deduced from machine learning , 2018, EXCLI journal.

[79]  Balachandran Manavalan,et al.  i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. , 2019, International journal of biological macromolecules.

[80]  D. Marvin,et al.  The protein capsid of filamentous bacteriophage PH75 from Thermus thermophilus. , 2001, Journal of molecular biology.