Computational prediction of therapeutic peptides based on graph index

As therapeutic peptides have been taken into consideration in disease therapy in recent years, many biologists spent time and labor to verify various functional peptides from a large number of peptide sequences. In order to reduce the workload and increase the efficiency of identification of functional proteins, we propose a sequence-based model, q-FP (functional peptide prediction based on the q-Wiener Index), capable of recognizing potentially functional proteins. We extract three types of features by mixing graphic representation and statistical indices based on the q-Wiener index and physicochemical properties of amino acids. Our support-vector-machine-based model achieves an accuracy of 96.71%, 93.34%, 98.40%, and 91.40% for anticancer, virulent, and allergenic proteins datasets, respectively, by using 5-fold cross validation.

[1]  Davor Juretic,et al.  DADP: the database of anuran defense peptides , 2012, Bioinform..

[2]  Saravanan Vijayakumar,et al.  ACPP: A Web Server for Prediction and Design of Anti-cancer Peptides , 2014, International Journal of Peptide Research and Therapeutics.

[3]  Cangzhi Jia,et al.  A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. , 2010, Journal of theoretical biology.

[4]  K. Chou Graphic rule for drug metabolism systems. , 2010, Current drug metabolism.

[5]  D. Merrell,et al.  Cellular and Infection Microbiology , 2022 .

[6]  K. Chou,et al.  Wenxiang: a web-server for drawing wenxiang diagrams , 2011 .

[7]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[8]  Laurin A. J. Mueller,et al.  A network-based approach to classify the three domains of life , 2011, Biology Direct.

[9]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[10]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[11]  Guo-Ping Zhou The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism , 2011, Journal of Theoretical Biology.

[12]  Wei Liu,et al.  CORRIGENDUM: Biodegradation-inspired bioproduction of methylacetoin and 2-methyl-2,3-butanediol , 2013, Scientific Reports.

[13]  Joo Chuan Tong,et al.  AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins , 2007, Bioinform..

[14]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[15]  Kumardeep Chaudhary,et al.  In Silico Models for Designing and Discovering Novel Anticancer Peptides , 2013, Scientific Reports.

[16]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[17]  Jianping Ou,et al.  y-Wiener index of some graph operations , 2014 .

[18]  Chenglong Yu,et al.  Protein map: an alignment-free sequence comparison method based on various properties of amino acids. , 2011, Gene.

[19]  Loris Nanni,et al.  An Empirical Study of Different Approaches for Protein Classification , 2014, TheScientificWorldJournal.

[20]  K. Chou,et al.  iPGK-PseAAC: Identify Lysine Phosphoglycerylation Sites in Proteins by Incorporating Four Different Tiers of Amino Acid Pairwise Coupling Information into the General PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[21]  Gajendra P. S. Raghava,et al.  AlgPred: prediction of allergenic proteins and mapping of IgE epitopes , 2006, Nucleic Acids Res..

[22]  John P. Castagna,et al.  Reservoir Prediction Via SVM Pattern Recognition , 2004 .

[23]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[24]  J. Chou,et al.  Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E. , 1993, Biochemistry.

[25]  Alessandro Neri,et al.  Visualization and analysis of DNA sequences using DNA walks , 2004, J. Frankl. Inst..

[26]  Gajendra P. S. Raghava,et al.  Prediction of Neurotoxins Based on Their Function and Source , 2007, Silico Biol..

[27]  Lukasz A. Kurgan,et al.  SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences , 2008, BMC Bioinformatics.

[28]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[29]  Yu-Chu Tian,et al.  An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures , 2013, PloS one.

[30]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[31]  S. Forsén,et al.  Graphical rules for enzyme-catalysed rate laws. , 1980, The Biochemical journal.

[32]  Prabina Kumar Meher,et al.  Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC , 2017, Scientific Reports.

[33]  G. Zhou,et al.  An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. , 1984, The Biochemical journal.

[34]  S. Khan,et al.  Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. , 2017, Journal of theoretical biology.

[35]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[36]  Kuo-Chen Chou,et al.  pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. , 2017, Gene.

[37]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[38]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[39]  Kuo-Chen Chou,et al.  iATC‐mISF: a multi‐label classifier for predicting the classes of anatomical therapeutic chemicals , 2016, Bioinform..

[40]  Ivan Gutman,et al.  Chemical Graphs Constructed of Composite Graphs and Their q-Wiener Index , 2014 .

[41]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[42]  I. Gutman,et al.  Mathematical Concepts in Organic Chemistry , 1986 .

[43]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[44]  Loris Nanni,et al.  An ensemble of support vector machines for predicting virulent proteins , 2009, Expert Syst. Appl..

[45]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[46]  Gajendra P. S. Raghava,et al.  Prediction of allergenic proteins and mapping of IgE epitopes in allergens , 2007 .

[47]  Dinesh Gupta,et al.  VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens , 2008, BMC Bioinformatics.

[48]  K. Chou Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady-state systems. , 2020, Biophysical chemistry.

[49]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[50]  Cangzhi Jia,et al.  SulfoTyrP: A High Accuracy Predictor of Protein Sulfotyrosine Sites , 2014 .

[51]  K. Chou,et al.  Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression , 2017, Oncotarget.

[52]  Xuhua Xia,et al.  What Amino Acid Properties Affect Protein Evolution? , 1998, Journal of Molecular Evolution.

[53]  Kuo-Chen Chou,et al.  iPreny-PseAAC: Identify C-terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[54]  K. Chou,et al.  2D-MH: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. , 2010, Journal of theoretical biology.

[55]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[56]  Dennis H. Rouvray,et al.  The Rich Legacy of Half a Century of the Wiener Index , 2002 .

[57]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[58]  Dale Lackeyram,et al.  Transport of a tripeptide, Gly‐Pro‐Hyp, across the porcine intestinal brush‐border membrane , 2007, Journal of peptide science : an official publication of the European Peptide Society.

[59]  Bin Liu,et al.  Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences , 2017 .

[60]  Hassan Mohabatkar,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012, Medicinal chemistry (Shariqah (United Arab Emirates)).

[61]  I. Gutman,et al.  q-Analog of Wiener Index , 2012 .

[62]  K. Chou,et al.  Graphic rules in steady and non-steady state enzyme kinetics. , 1989, The Journal of biological chemistry.