CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency.

Cell-penetrating peptides (CPPs), have been proven as important drug-delivery vehicles, demonstrating the potential as therapeutic candidates. The past decade has witnessed a rapid growth in CPP-based research. Recently, many computational efforts have been made to develop machine-learning-based methods for identifying CPPs. Although much progress has been made, existing methods still suffer low feature representation capability that limits further performance improvement. In this study, we propose a novel predictor called CPPred-RF, in which we integrate multiple sequence-based feature descriptors to sufficiently explore distinct information embedded in CPPs, employ a well-established feature selection technique to improve the feature representation, and, for the first time, construct a two-layer prediction framework based on the random forest algorithm. The jackknife results on benchmark data sets show that the proposed CPPred-RF is at least competitive with the state-of-the-art predictors. Moreover, we establish the first online Web server in terms of predicting CPPs and their uptake efficiency simultaneously. It is freely available at http://server.malab.cn/CPPred-RF .

[1]  Hui Ding,et al.  The prediction of protein structural class using averaged chemical shifts , 2012, Journal of biomolecular structure & dynamics.

[2]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[3]  Kumardeep Chaudhary,et al.  Cell Penetrating Peptides , 2016 .

[4]  Xuan Liu,et al.  Identification of DNA-Binding Proteins by Combining Auto-Cross Covariance Transformation and Ensemble Learning , 2016, IEEE Transactions on NanoBioscience.

[5]  Hua Tang,et al.  Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition , 2016, BioMed research international.

[6]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[7]  M. Morris,et al.  Twenty years of cell-penetrating peptides: from molecular mechanisms to therapeutics , 2009, British journal of pharmacology.

[8]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[9]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[10]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[11]  L. Shapiro,et al.  TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2022 .

[12]  Gianluca Pollastri,et al.  CPPpred: prediction of cell penetrating peptides , 2013, Bioinform..

[13]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[15]  F. Milletti,et al.  Cell-penetrating peptides: classes, origin, and current landscape. , 2012, Drug discovery today.

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[18]  Wei Chen,et al.  Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. , 2014, Molecular bioSystems.

[19]  Astrid Gräslund,et al.  Mechanisms of Cellular Uptake of Cell-Penetrating Peptides , 2011, Journal of biophysics.

[20]  Ű. Langel,et al.  Predicting cell-penetrating peptides. , 2008, Advanced drug delivery reviews.

[21]  Wei Chen,et al.  iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition , 2016, Oncotarget.

[22]  Gajendra P. S. Raghava,et al.  CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides , 2015, Nucleic Acids Res..

[23]  P Vallotton,et al.  Detection of tubule boundaries based on circular shortest path and polar‐transformation of arbitrary shapes , 2016, Journal of microscopy.

[24]  Ren Long,et al.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework , 2016, Bioinform..

[25]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[26]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Chen Chu,et al.  Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models , 2015, Amino Acids.

[28]  Xiaolong Wang,et al.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach , 2016, Journal of biomolecular structure & dynamics.

[29]  Tarmo Tamm,et al.  Prediction of Cell-Penetrating Peptides Using Artificial Neural Networks. , 2010, Current computer-aided drug design.

[30]  Wei Chen,et al.  Prediction of cell-penetrating peptides with feature selection techniques. , 2016, Biochemical and biophysical research communications.

[31]  Susan M. Bridges,et al.  Prediction of Cell Penetrating Peptides by Support Vector Machines , 2011, PLoS Comput. Biol..

[32]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[33]  Hui Ding,et al.  Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. , 2013, Toxicology in vitro : an international journal published in association with BIBRA.

[34]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[35]  Leyi Wei,et al.  A novel hierarchical selective ensemble classifier with bioinformatics application , 2017, Artif. Intell. Medicine.

[36]  Hua Tang,et al.  Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition , 2016, BioMed research international.

[37]  B. Liu,et al.  An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier , 2013, BioMed research international.

[38]  Jian Huang,et al.  Prediction of Golgi-resident protein types by using feature selection technique , 2013 .

[39]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Gabriel del Rio,et al.  Effective Design of Multifunctional Peptides by Combining Compatible Functions , 2016, PLoS Comput. Biol..

[42]  Ran Su,et al.  Identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine , 2017, Scientific Reports.

[43]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[44]  Hao Lin,et al.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.