mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides

Anticancer peptides (ACPs) are promising therapeutic agents for targeting and killing cancer cells. The accurate prediction of ACPs from given peptide sequences remains as an open problem in the field of immunoinformatics. Recently, machine learning algorithms have emerged as a promising tool for helping experimental scientists predict ACPs. However, the performance of existing methods still needs to be improved. In this study, we present a novel approach for the accurate prediction of ACPs, which involves the following two steps: (i) We applied a two-step feature selection protocol on seven feature encodings that cover various aspects of sequence information (composition-based, physicochemical properties and profiles) and obtained their corresponding optimal feature-based models. The resultant predicted probabilities of ACPs were further utilized as feature vectors. (ii) The predicted probability feature vectors were in turn used as an input to support vector machine to develop the final prediction model called mACPpred. Cross-validation analysis showed that the proposed predictor performs significantly better than individual feature encodings. Furthermore, mACPpred significantly outperformed the existing methods compared in this study when objectively evaluated on an independent dataset.

[1]  Ran Su,et al.  M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning , 2018, Molecular therapy. Nucleic acids.

[2]  Xinyi Liu,et al.  Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. , 2019, Methods.

[3]  Jiangning Song,et al.  Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors , 2018, Bioinform..

[4]  H. Scheraga,et al.  Status of empirical methods for the prediction of protein backbone topography. , 1976, Biochemistry.

[5]  M. Castanho,et al.  From antimicrobial to anticancer peptides. A review , 2013, Front. Microbiol..

[6]  Simon Fong,et al.  AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest , 2018, Scientific Reports.

[7]  Balachandran Manavalan,et al.  MLACP: machine-learning-based prediction of anticancer peptides , 2017, Oncotarget.

[8]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[9]  Jooyoung Lee,et al.  SVMQA: support‐vector‐machine‐based protein single‐model quality assessment , 2017, Bioinform..

[10]  Sarah R Dennison,et al.  On the selectivity and efficacy of defense peptides with respect to cancer cells , 2013, Medicinal research reviews.

[11]  Davor Juretic,et al.  DADP: the database of anuran defense peptides , 2012, Bioinform..

[12]  Z. R. Li,et al.  PROFEAT Update: A Protein Features Web Server with Added Facility to Compute Network Descriptors for Studying Omics-Derived Networks. , 2017, Journal of molecular biology.

[13]  Hao Lv,et al.  Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique , 2018, Bioinform..

[14]  Nazneen Rahman,et al.  Realizing the promise of cancer predisposition genes , 2014, Nature.

[15]  Jiangning Song,et al.  Bastion3: a two-layer ensemble predictor of type III secreted effectors , 2018, Bioinform..

[16]  Shneior Lifson,et al.  Antiparallel and parallel β-strands differ in amino acid residue preferences , 1979, Nature.

[17]  A. Jemal,et al.  Cancer statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[18]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[19]  Ran Su,et al.  Exploring sequence‐based features for the improved prediction of DNA N4‐methylcytosine sites in multiple species , 2018, Bioinform..

[20]  Leyi Wei,et al.  mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation , 2018, Bioinform..

[21]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[22]  Miao Sun,et al.  AngularQA: Protein Model Quality Assessment with LSTM Networks , 2019 .

[23]  Robert J Gillies,et al.  Metabolism and Its Sequelae in Cancer Evolution and Therapy , 2015, Cancer journal.

[24]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Darby Tien-Hao Chang,et al.  Combining Phylogenetic Profiling-Based and Machine Learning-Based Techniques to Predict Functional Related Proteins , 2013, PloS one.

[26]  C. Chothia,et al.  The Packing Density in Proteins: Standard Radii and Volumes , 1999 .

[27]  Renzhi Cao,et al.  Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 , 2019, Proteins.

[28]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[29]  R. Jernigan,et al.  Self‐consistent estimation of inter‐residue protein contact energies based on an equilibrium mixture approximation of residues , 1999, Proteins.

[30]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[31]  Rajesh Kumar,et al.  In Silico Tools and Databases for Designing Peptide-Based Vaccine and Drugs. , 2018, Advances in protein chemistry and structural biology.

[32]  Gwang Lee,et al.  PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine , 2018, Front. Microbiol..

[33]  Hua Tang,et al.  A two-step discriminated method to identify thermophilic proteins , 2017 .

[34]  Gajendra P. S. Raghava,et al.  CancerPPD: a database of anticancer peptides and proteins , 2014, Nucleic Acids Res..

[35]  Ran Su,et al.  CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning , 2018, Briefings Bioinform..

[36]  Hanmei Xu,et al.  DRAMP: a comprehensive data repository of antimicrobial peptides , 2016, Scientific Reports.

[37]  Wei Chen,et al.  i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome , 2019, Bioinform..

[38]  Balachandran Manavalan,et al.  Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy. , 2018, Journal of proteome research.

[39]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[40]  Xiaowei Zhao,et al.  LAMP: A Database Linking Antimicrobial Peptides , 2013, PloS one.

[41]  Jie Hu,et al.  Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools , 2019, Briefings Bioinform..

[42]  Shandar Ahmad,et al.  PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools , 2010, Adv. Bioinformatics.

[43]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[44]  Miao Sun,et al.  QAcon: single model quality assessment using protein structural and contact information with machine learning techniques , 2016, Bioinform..

[45]  Donato Cascio,et al.  An Automatic HEp-2 Specimen Analysis System Based on an Active Contours Model and an SVM Classification , 2019, Applied Sciences.

[46]  William Stafford Noble,et al.  Empirical comparison of web‐based antimicrobial peptide prediction tools , 2017, Bioinform..

[47]  Andrei Gabrielian,et al.  DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides , 2015, Nucleic acids research.

[48]  Kumardeep Chaudhary,et al.  In Silico Models for Designing and Discovering Novel Anticancer Peptides , 2013, Scientific Reports.

[49]  Balachandran Manavalan,et al.  DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest , 2017, bioRxiv.

[50]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[51]  P. Johnston,et al.  Cancer drug resistance: an evolving paradigm , 2013, Nature Reviews Cancer.

[52]  Fu-Ying Dao,et al.  Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods , 2018, Molecules.

[53]  Raffaele Pezzani,et al.  Phytotherapeutics in cancer invasion and metastasis , 2018, Phytotherapy research : PTR.

[54]  Quan Zou,et al.  ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. , 2019, Journal of proteome research.

[55]  Myeong Ok Kim,et al.  PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions , 2018, Front. Immunol..

[56]  Ujjwal Maulik,et al.  Fuzzy clustering of physicochemical and biochemical properties of amino Acids , 2011, Amino Acids.

[57]  Augustin Scalbert,et al.  Measuring the exposome: A powerful basis for evaluating environmental exposures and cancer risk , 2013, Environmental and molecular mutagenesis.

[58]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[59]  Myeong Ok Kim,et al.  iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction , 2018, Front. Immunol..

[60]  J Foo,et al.  Spatial Measures of Genetic Heterogeneity During Carcinogenesis , 2017, Bulletin of mathematical biology.

[61]  Zhangxin Chen,et al.  ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network , 2017, Molecules.

[62]  Gwang Lee,et al.  AIPpred: Sequence-Based Prediction of Anti-inflammatory Peptides Using Random Forest , 2018, Front. Pharmacol..

[63]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[64]  Jyothi Thundimadathil,et al.  Cancer Treatment Using Peptides: Current Therapies and Future Prospects , 2012, Journal of amino acids.

[65]  Jiangning Song,et al.  SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites. , 2016, Molecular bioSystems.

[66]  L. Otvos Peptide-based drug design: here and now. , 2008, Methods in molecular biology.

[67]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[68]  B. Matthews,et al.  Structural basis of amino acid alpha helix propensity. , 1993, Science.

[69]  G. von Heijne,et al.  Predicting the topology of eukaryotic membrane proteins. , 1993, European journal of biochemistry.

[70]  Renzhi Cao,et al.  Survey of Machine Learning Techniques in Drug Discovery. , 2019, Current drug metabolism.

[71]  Jie Hou,et al.  DeepQA: improving the estimation of single protein model quality with deep belief networks , 2016, BMC Bioinformatics.

[72]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[73]  Balachandran Manavalan,et al.  iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree , 2018, Computational and structural biotechnology journal.

[74]  Long Zhang,et al.  Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences , 2017, International journal of molecular sciences.

[75]  Ran Su,et al.  M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species , 2018, Front. Genet..

[76]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[77]  Jiangning Song,et al.  MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters , 2019, Bioinform..

[78]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..