Intelligent computational method for discrimination of anticancer peptides by incorporating sequential and evolutionary profiles information

Abstract Cancer is one of the prominent threats to human life worldwide. Traditional therapeutic mechanisms like chemotherapy, radiation and surgical operations are exploited for its treatment. However, these clinical treatments are unfavorable, challenging and have severe impacts on human body. Recently, the discovery of anticancer peptides (ACPs) has become an influential anticancer drug agent due to their nontoxic characteristic and safe cellular uptake of therapeutic drugs. In this regard, much progress has been made to develop computational methods for ACPs prediction to accelerate their effectiveness against cancer. However, challenges remain in terms of discriminative feature representation, typical imbalance issue and prediction performance. In this study, we report a novel predictor, TargetACP, by integrating sequential and evolutionary-profiles information solely from primary protein sequences. Synthetic minority oversampling technique is utilized to cope with imbalance phenomenon between minority (ACPs) and majority (non-ACPs) samples. Finally, Support vector machine is employed as a learning hypothesis. Experimental results demonstrated that our predictor achieved an accuracy of 98.78% on benchmark dataset using jackknife cross-validation test. The generalization capability of the proposed method was evaluated through independent dataset which yielded accuracy of 94.66%. The empirical outcomes revealed that our model outperformed existing methods on same datasets. Furthermore, it is anticipated that TargetACP model will provide deep insights to pharmaceutical industry to design new anticancer drugs and research community to innovate new ideas in the area of bioinformatics, proteomics and computational biology.

[1]  D. Hoskin,et al.  Cationic antimicrobial peptides as novel cytotoxic agents for cancer treatment , 2006, Expert opinion on investigational drugs.

[2]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[3]  T. Tsunoda,et al.  Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction , 2018, BMC Genomics.

[4]  M. Khrestchatisky,et al.  Synthetic therapeutic peptides: science and market. , 2010, Drug discovery today.

[5]  Wei Chen,et al.  Pro54DB: a database for experimentally verified sigma‐54 promoters , 2016, Bioinform..

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Cathy H. Wu,et al.  Protein classification artificial neural system , 1992, Protein science : a publication of the Protein Society.

[8]  Wei Chen,et al.  Prediction of phosphothreonine sites in human proteins by fusing different features , 2016, Scientific Reports.

[9]  Runtao Yang,et al.  An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics , 2015, International journal of molecular sciences.

[10]  A. Jemal,et al.  Global Cancer Statistics , 2011 .

[11]  Nathalie Japkowicz,et al.  Boosting support vector machines for imbalanced data sets , 2008, Knowledge and Information Systems.

[12]  Maqsood Hayat,et al.  Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types. , 2012, Analytical biochemistry.

[13]  Yan Huang,et al.  RNALocate: a resource for RNA subcellular localizations , 2016, Nucleic Acids Res..

[14]  Saeed Ahmad,et al.  iTIS-PseKNC: Identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition , 2015, Comput. Biol. Medicine.

[15]  Abdollah Dehzangi,et al.  Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids. , 2016, Journal of theoretical biology.

[16]  Maqsood Hayat,et al.  iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples , 2015, Molecular Genetics and Genomics.

[17]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[18]  Kumardeep Chaudhary,et al.  In Silico Models for Designing and Discovering Novel Anticancer Peptides , 2013, Scientific Reports.

[19]  Abdollah Dehzangi,et al.  Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams , 2018, PloS one.

[20]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[21]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[22]  Arbab Waseem Abbas,et al.  Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN , 2015, Int. J. Speech Technol..

[23]  A. Jemal,et al.  Cancer statistics, 2013 , 2013, CA: a cancer journal for clinicians.

[24]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[25]  Kuldip K. Paliwal,et al.  A Tri-Gram Based Feature Extraction Technique Using Linear Probabilities of Position Specific Scoring Matrix for Protein Fold Recognition , 2014, IEEE Transactions on NanoBioscience.

[26]  Bo Jiang,et al.  Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes , 2014, PloS one.

[27]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[28]  Hao Lin,et al.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Hua Tang,et al.  IonchanPred 2.0: A Tool to Predict Ion Channels and Their Types , 2017, International journal of molecular sciences.

[30]  Zahoor Jan,et al.  iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition. , 2018, Journal of theoretical biology.

[31]  Geoffrey I. Webb,et al.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..

[32]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[33]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[34]  Albert Y. Zomaya,et al.  A particle swarm based hybrid system for imbalanced medical data sampling , 2009, BMC Genomics.

[35]  Feng-Min Li,et al.  Identifying anticancer peptides by using improved hybrid compositions , 2016, Scientific Reports.

[36]  Khurshid Ahmad,et al.  Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix , 2016, Neurocomputing.

[37]  Yan-Hua Lai,et al.  The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection , 2014 .

[38]  Muhammad Iqbal,et al.  iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space , 2017, Artif. Intell. Medicine.

[39]  Saeed Ahmad,et al.  Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC , 2015, Comput. Methods Programs Biomed..

[40]  Neil Davey,et al.  Using sampling methods to improve binding site predictions , 2006, ESANN.

[41]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[42]  D. Hoskin,et al.  Studies on anticancer activities of antimicrobial peptides. , 2008, Biochimica et biophysica acta.

[43]  Hong-Bin Shen,et al.  TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition , 2015, The Journal of Membrane Biology.

[44]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[45]  Wei Chen,et al.  Sequence-based predictive modeling to identify cancerlectins , 2017, Oncotarget.

[46]  Muhammad Tahir,et al.  Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou's trinucleotide composition , 2017, Comput. Methods Programs Biomed..

[47]  Asifullah Khan,et al.  IDM-PhyChm-Ens: Intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids , 2014, Amino Acids.

[48]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[49]  Jing-Yu Yang,et al.  A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction , 2014, PloS one.

[50]  Saeed Ahmad,et al.  Improving prediction of extracellular matrix proteins using evolutionary information via a grey system model and asymmetric under-sampling technique , 2018 .

[51]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[52]  Robert E W Hancock,et al.  Mastoparan is a membranolytic anti-cancer peptide that works synergistically with gemcitabine in a mouse model of mammary carcinoma. , 2016, Biochimica et biophysica acta.

[53]  S. Katebi,et al.  Protein Superfamily Classification Using Fuzzy Rule-Based Classifier , 2009, IEEE Transactions on NanoBioscience.

[54]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[55]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[56]  T. Tsunoda,et al.  SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. , 2017, Analytical biochemistry.

[57]  Jun Hu,et al.  TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM , 2016, Amino Acids.

[58]  T. Tsunoda,et al.  PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. , 2017, Journal of theoretical biology.

[59]  K. Chou,et al.  iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites , 2018, Molecular therapy. Nucleic acids.

[60]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[61]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[62]  B. Rasmuson,et al.  Inducible Antibacterial Defence System in Drosophila , 1972, Nature.

[63]  Muhammad Kabir,et al.  Predicting DNase I hypersensitive sites via un-biased pseudo trinucleotide composition , 2017 .