iPVP-MCV: A Multi-Classifier Voting Model for the Accurate Identification of Phage Virion Proteins

The classic structure of a bacteriophage is commonly characterized by complex symmetry. The head of the structure features icosahedral symmetry, whereas the tail features helical symmetry. The phage virion protein (PVP), a type of bacteriophage structural protein, is an essential material of the infectious viral particles and is responsible for multiple biological functions. Accurate identification of PVPs is of great significance for comprehending the interaction between phages and host bacteria and developing new antimicrobial drugs or antibiotics. However, traditional experimental approaches for identifying PVPs are often time-consuming and laborious. Therefore, the development of computational methods that can efficiently and accurately identify PVPs is desired. In this study, we proposed a multi-classifier voting model called iPVP-MCV to enhance the predictive performance of PVPs based on their amino acid sequences. First, three types of evolutionary features were extracted from the position-specific scoring matrix (PSSM) profiles to represent PVPs and non-PVPs. Then, a set of baseline models were trained based on the support vector machine (SVM) algorithm combined with each type of feature descriptors. Finally, the outputs of these baseline models were integrated to construct the proposed method iPVP-MCV by using the majority voting strategy. Our results demonstrated that the proposed iPVP-MCV model was superior to existing methods when performing the rigorous independent dataset test.

[1]  Ayal B. Gussow,et al.  Seeker: Alignment-free identification of bacteriophage genomes by deep learning , 2020, bioRxiv.

[2]  Nalini Schaduangrat,et al.  PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method , 2020, Cells.

[3]  Justine W. Debelius,et al.  The gut–liver axis and the intersection with the microbiome , 2018, Nature Reviews Gastroenterology & Hepatology.

[4]  William Stafford Noble,et al.  Support vector machine , 2013 .

[5]  Victor Seguritan,et al.  Artificial Neural Networks Trained to Detect Viral and Phage Structural Proteins , 2012, PLoS Comput. Biol..

[6]  Subhash G. Vasudevan,et al.  High Affinity Human Antibody Fragments to Dengue Virus Non-Structural Protein 3 , 2010, PLoS neglected tropical diseases.

[7]  Lingyun Zou,et al.  Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles , 2013, Bioinform..

[8]  Chunyu Wang,et al.  Identification of Phage Viral Proteins With Hybrid Sequence Features , 2019, Front. Microbiol..

[9]  Geoffrey I. Webb,et al.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..

[10]  Jeff Lyon,et al.  Phage Therapy's Role in Combating Antibiotic-Resistant Pathogens. , 2017, JAMA.

[11]  Vito Adrian Cantu,et al.  PhANNs, a fast and accurate tool and web server to classify phage structural proteins , 2020, PLoS computational biology.

[12]  Jason R. Clark,et al.  Bacteriophages and biotechnology: vaccines, gene therapy and antibacterials. , 2006, Trends in biotechnology.

[13]  N. Singhal,et al.  mRNALoc: a novel machine-learning based in-silico tool to predict mRNA subcellular localization , 2020, Nucleic Acids Res..

[14]  Trevor Lithgow,et al.  PaCRISPR: a server for predicting and visualizing anti-CRISPR proteins , 2020, Nucleic Acids Res..

[15]  Wei Chen,et al.  Naïve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins , 2013, Comput. Math. Methods Medicine.

[16]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[17]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[18]  José Luis Balcázar,et al.  Exploring the contribution of bacteriophages to antibiotic resistance. , 2017, Environmental pollution.

[19]  Fei Guo,et al.  Review and comparative analysis of machine learning-based phage virion protein identification methods. , 2020, Biochimica et biophysica acta. Proteins and proteomics.

[20]  Xiaoqi Zheng,et al.  Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. , 2010, Biochimie.

[21]  Runtao Yang,et al.  An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics , 2015, International journal of molecular sciences.

[22]  Gwang Lee,et al.  PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine , 2018, Front. Microbiol..

[23]  Muhammad Arif,et al.  Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. , 2020, Genomics.

[24]  T. van de Wiele,et al.  Changes in gut microbiota control inflammation in obese mice through a mechanism involving GLP-2-driven improvement of gut permeability , 2009, Gut.

[25]  Hui Ding,et al.  Recent advances of computational methods for identifying bacteriophage virion proteins. , 2020, Protein and peptide letters.

[26]  Roman Schulte-Sasse,et al.  TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs , 2019, Nucleic acids research.

[27]  Zhen Liu,et al.  Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree , 2018, International journal of molecular sciences.

[28]  Fu-Ying Dao,et al.  Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods , 2018, Molecules.

[29]  Chanin Nantasenamat,et al.  Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation , 2020, Journal of Computer-Aided Molecular Design.

[30]  Qi Zhao,et al.  Recent Advances on the Machine Learning Methods in Identifying Phage Virion Proteins , 2020, Current Bioinformatics.

[31]  Peter B. McGarvey,et al.  UniProt: the universal protein knowledgebase in 2021 , 2020, Nucleic Acids Res..

[32]  Manuel Fuentes,et al.  Screening Phage-Display Antibody Libraries Using Protein Arrays. , 2018, Methods in molecular biology.

[33]  Rob Lavigne,et al.  Phage proteomics: applications of mass spectrometry. , 2009, Methods in molecular biology.