TargetCPP: accurate prediction of cell-penetrating peptides from optimized multi-scale features using gradient boost decision tree

Cell-penetrating peptides (CPPs) are short length permeable proteins have emerged as drugs delivery tool of therapeutic agents including genetic materials and macromolecules into cells. Recently, CPP has become a hotspot avenue for life science research and paved a new way of disease treatment without harmful impact on cell viability due to nontoxic characteristic. Therefore, the correct identification of CPPs will provide hints for medical applications. Considering the shortcomings of traditional experimental CPPs identification, it is urgently needed to design intelligent predictor for accurate identification of CPPs for the large scale uncharacterized sequences. We develop a novel computational method, called TargetCPP, to discriminate CPPs from Non-CPPs with improved accuracy. In TargetCPP, first the peptide sequences are formulated with four distinct encoding methods i.e., composite protein sequence representation, composition transition and distribution, split amino acid composition, and information theory features. These dominant feature vectors were fused and applied intelligent minimum redundancy and maximum relevancy feature selection method to choose an optimal subset of features. Finally, the predictive model is learned through different classification algorithms on the optimized features. Among these classifiers, gradient boost decision tree algorithm achieved excellent performance throughout the experiments. Notably, the TargetCPP tool attained high prediction Accuracy of 93.54% and 88.28% using jackknife and independent test, respectively. Empirical outcomes prove the superiority and potency of proposed bioinformatics method over state-of-the-art methods. It is highly anticipated that the outcomes of this study will provide a strong background for large scale prediction of CPPs and instructive guidance in clinical therapy and medical applications.

[1]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[2]  L. Otvos Peptide-based drug design: here and now. , 2008, Methods in molecular biology.

[3]  Xuhua Xia,et al.  What Amino Acid Properties Affect Protein Evolution? , 1998, Journal of Molecular Evolution.

[4]  Runtao Yang,et al.  An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics , 2015, International journal of molecular sciences.

[5]  H. Lennernäs,et al.  Comparison between active and passive drug transport in human intestinal epithelial (Caco-2) cells in vitro and human jejunum in vivo , 1996 .

[6]  Jun Hu,et al.  TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Maqsood Hayat,et al.  Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou’s General Pseudo Amino Acid Composition , 2016, The Journal of Membrane Biology.

[8]  Imre Mäger,et al.  The role of endocytosis on the uptake kinetics of luciferin-conjugated cell-penetrating peptides. , 2012, Biochimica et biophysica acta.

[9]  Yan Huang,et al.  Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features , 2012, BMC Bioinformatics.

[10]  Mohammed Yeasin,et al.  Prediction of membrane proteins using split amino acid and ensemble classification , 2011, Amino Acids.

[11]  Simon Fong,et al.  AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest , 2018, Scientific Reports.

[12]  Astrid Gräslund,et al.  Efficient intracellular delivery of nucleic acid pharmaceuticals using cell-penetrating peptides. , 2012, Accounts of chemical research.

[13]  Fei Guo,et al.  Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree , 2017, PloS one.

[14]  K. Chou,et al.  iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. , 2017, Genomics.

[15]  Gabriel del Rio,et al.  Effective Design of Multifunctional Peptides by Combining Compatible Functions , 2016, PLoS Comput. Biol..

[16]  Susan M. Bridges,et al.  Prediction of Cell Penetrating Peptides by Support Vector Machines , 2011, PLoS Comput. Biol..

[17]  Hua Tang,et al.  Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. , 2016, Molecular bioSystems.

[18]  R. Huber,et al.  Flexibility and rigidity, requirements for the function of proteins and protein pigment complexes. Eleventh Keilin memorial lecture. , 1987, Biochemical Society transactions.

[19]  Raghvendra Mall,et al.  PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine , 2018, Bioinform..

[20]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[21]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A. Mitra,et al.  Recent developments in protein and peptide parenteral delivery approaches. , 2014, Therapeutic delivery.

[24]  Wei-Chiang Shen,et al.  Cationic and amphipathic cell-penetrating peptides (CPPs): Their structures and in vivo studies in drug delivery , 2015, Frontiers of Chemical Science and Engineering.

[25]  Maqsood Hayat,et al.  "iSS-Hyb-mRMR": Identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition , 2016, Comput. Methods Programs Biomed..

[26]  Kuo-Chen Chou,et al.  Molecular modeling studies of peptide drug candidates against SARS. , 2006, Medicinal chemistry (Shariqah (United Arab Emirates)).

[27]  Shana O Kelley,et al.  Recent advances in the use of cell-penetrating peptides for medical and biological applications. , 2009, Advanced drug delivery reviews.

[28]  Joshua D Ramsey,et al.  Cell-penetrating peptides transport therapeutics into cells. , 2015, Pharmacology & therapeutics.

[29]  Qianyu Zhang,et al.  Taming Cell Penetrating Peptides: Never Too Old To Teach Old Dogs New Tricks. , 2015, Molecular pharmaceutics.

[30]  Hyun Seok Song,et al.  Screening of cell‐penetrating peptides using mRNA display , 2012, Biotechnology journal.

[31]  Ű. Langel,et al.  Predicting cell-penetrating peptides. , 2008, Advanced drug delivery reviews.

[32]  Scott Banta,et al.  An unusual cell penetrating peptide identified using a plasmid display-based functional selection platform. , 2011, ACS chemical biology.

[33]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[34]  Tarmo Tamm,et al.  Prediction of Cell-Penetrating Peptides Using Artificial Neural Networks. , 2010, Current computer-aided drug design.

[35]  Xiaoli Zhang,et al.  RBPPred: predicting RNA‐binding proteins from sequence using SVM , 2016, Bioinform..

[36]  Wei Chen,et al.  Prediction of cell-penetrating peptides with feature selection techniques. , 2016, Biochemical and biophysical research communications.

[37]  Yan-Hua Lai,et al.  The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection , 2014 .

[38]  Q. Zou,et al.  SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides , 2017, BMC Genomics.

[39]  Maurice Green,et al.  Autonomous functional domains of chemically synthesized human immunodeficiency virus tat trans-activator protein , 1988, Cell.

[40]  Jing-Yu Yang,et al.  A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites , 2015, IEEE Transactions on NanoBioscience.

[41]  Achuthsankar S. Nair,et al.  Composition, Transition and Distribution (CTD) — A dynamic feature for predictions based on hierarchical structure of cellular sorting , 2011, 2011 Annual IEEE India Conference.

[42]  Wolfgang E. Trommer Journal of Membrane Biology: Editorial , 2016, The Journal of Membrane Biology.

[43]  Keiji Numata,et al.  Rapid and efficient gene delivery into plant cells using designed peptide carriers. , 2013, Biomacromolecules.

[44]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[45]  D. Raucher,et al.  Cell-penetrating peptides: strategies for anticancer treatment. , 2015, Trends in molecular medicine.

[46]  Carl O. Pabo,et al.  Cellular uptake of the tat protein from human immunodeficiency virus , 1988, Cell.

[47]  Masaaki Kurihara,et al.  Plasmid DNA delivery by arginine-rich cell-penetrating peptides containing unnatural amino acids. , 2016, Bioorganic & medicinal chemistry.

[48]  Scott Dick,et al.  CRYSTALP2: sequence-based protein crystallization propensity prediction , 2009, BMC Structural Biology.

[49]  Scott Banta,et al.  Evaluation of the cell-penetrating peptide TAT as a trans-blood-brain barrier delivery vehicle , 2010, Proceedings of the 2010 IEEE 36th Annual Northeast Bioengineering Conference (NEBEC).

[50]  Scott Dick,et al.  Classifier ensembles for protein structural class prediction with varying homology. , 2006, Biochemical and biophysical research communications.

[51]  M. Bink,et al.  Integrated QTL detection for key breeding traits in multiple peach progenies , 2017, BMC Genomics.

[52]  Muhammad Tahir,et al.  MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification , 2013, Comput. Biol. Medicine.

[53]  Gajendra P. S. Raghava,et al.  CPPsite: a curated database of cell penetrating peptides , 2012, Database J. Biol. Databases Curation.

[54]  Jiaxi Wang,et al.  Cell-penetrating peptides as noninvasive transmembrane vectors for the development of novel multifunctional drug-delivery systems. , 2016, Journal of controlled release : official journal of the Controlled Release Society.

[55]  Jijun Tang,et al.  Improved detection of DNA-binding proteins via compression technology on PSSM information , 2017, PloS one.

[56]  Robert H. Newman,et al.  RF-Phos: Random forest-based prediction of phosphorylation sites , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[57]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Virapong Prachayasittikul,et al.  PAAP: a web server for predicting antihypertensive activity of peptides. , 2018, Future medicinal chemistry.

[59]  Hamid D. Ismail,et al.  RF-Hydroxysite: a random forest based predictor for hydroxylation sites. , 2016, Molecular bioSystems.

[60]  Ya-ping Xu,et al.  Identification of Thyroid Carcinoma Related Genes with mRMR and Shortest Path Approaches , 2014, PloS one.

[61]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[62]  Muhammad Arif,et al.  Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. , 2020, Genomics.

[63]  Dan Li,et al.  Prediction of luciferase inhibitors by the high-performance MIEC-GBDT approach based on interaction energetic patterns. , 2017, Physical chemistry chemical physics : PCCP.

[64]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[65]  Zaheer Ullah Khan,et al.  DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space , 2018, Chemometrics and Intelligent Laboratory Systems.

[66]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[67]  Jian Huang,et al.  Prediction of Golgi-resident protein types by using feature selection technique , 2013 .

[68]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[69]  Lennart Nilsson,et al.  Rigidity versus flexibility: the dilemma of understanding protein thermal stability , 2015, The FEBS journal.

[70]  K. Chou,et al.  Progress in computational approach to drug development against SARS. , 2006, Current medicinal chemistry.

[71]  Xavi Ribas,et al.  Identification of BP16 as a non-toxic cell-penetrating peptide with highly efficient drug delivery properties. , 2014, Organic & biomolecular chemistry.

[72]  D. Shaw,et al.  A non–RGD-based integrin binding peptide (ATN-161) blocks breast cancer growth and metastasis in vivo , 2006, Molecular Cancer Therapeutics.

[73]  Yong Huang,et al.  In Silico Prediction of Gamma-Aminobutyric Acid Type-A Receptors Using Novel Machine-Learning-Based SVM and GBDT Approaches , 2016, BioMed research international.

[74]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[75]  Eric Schulz,et al.  Gene therapeutic approaches to inhibit hepatitis B virus replication. , 2015, World journal of hepatology.

[76]  Laszlo Otvos,et al.  Peptide-Based Drug Design , 2008, Methods In Molecular Biology™.

[77]  Zahoor Jan,et al.  iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition. , 2018, Journal of theoretical biology.

[78]  Koji Ono,et al.  The acceleration of boron neutron capture therapy using multi-linked mercaptoundecahydrododecaborate (BSH) fused cell-penetrating peptide. , 2014, Biomaterials.

[79]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[80]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[81]  Yadi Zhou,et al.  Prediction of Chemical-Protein Interactions Network with Weighted Network-Based Inference Method , 2012, PloS one.

[82]  Gianluca Pollastri,et al.  CPPpred: prediction of cell penetrating peptides , 2013, Bioinform..

[83]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[84]  Paul M. Mather,et al.  An assessment of the effectiveness of decision tree methods for land cover classification , 2003 .

[85]  Wei Chen,et al.  Naïve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins , 2013, Comput. Math. Methods Medicine.

[86]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[87]  Saeed Ahmad,et al.  Intelligent computational method for discrimination of anticancer peptides by incorporating sequential and evolutionary profiles information , 2018, Chemometrics and Intelligent Laboratory Systems.

[88]  J. Scharnert,et al.  A newly identified bacterial cell-penetrating peptide that reduces the transcription of pro-inflammatory cytokines , 2010, Journal of Cell Science.

[89]  Jing Lu,et al.  Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm , 2016, BioMed research international.

[90]  Yu-Chu Tian,et al.  An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures , 2013, PloS one.

[91]  Saeed Ahmad,et al.  Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC , 2015, Comput. Methods Programs Biomed..

[92]  Hadi Valizadeh,et al.  Enhanced cellular internalization of CdTe quantum dots mediated by arginine- and tryptophan-rich cell-penetrating peptides as efficient carriers , 2016, Artificial cells, nanomedicine, and biotechnology.

[93]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[94]  C. Morais,et al.  Cell-penetrating peptides as nucleic acid delivery systems: from biophysics to biological applications. , 2013, Current pharmaceutical design.

[95]  Lukasz A. Kurgan,et al.  Classification of Cell Membrane Proteins , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[96]  Chen Chu,et al.  Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models , 2015, Amino Acids.

[97]  Tao Huang,et al.  Prediction of lysine ubiquitination with mRMR feature selection and analysis , 2011, Amino Acids.

[98]  Emily Chia-Yu Su,et al.  Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and physicochemical features , 2016, BMC Bioinformatics.

[99]  Yang Yang,et al.  Synergistic targeted delivery of payload into cancer cells using liposomes co-modified with photolabile-caged cell-penetrating peptides and targeting ligands. , 2015, Journal of controlled release : official journal of the Controlled Release Society.