ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides

Fast and accurate identification of the peptides with anticancer activity potential from large-scale proteins is currently a challenging task. In this study, we propose a new machine learning predictor, namely, ACPred-Fuse, that can automatically and accurately predict protein sequences with or without anticancer activity in peptide form. Specifically, we establish a feature representation learning model that can explore class and probabilistic information embedded in anticancer peptides (ACPs) by integrating a total of 29 different sequence-based feature descriptors. In order to make full use of various multiview information, we further fused the class and probabilistic features with handcrafted sequential features and then optimized the representation ability of the multiview features, which are ultimately used as input for training our prediction model. By comparing the multiview features and existing feature descriptors, we demonstrate that the fused multiview features have more discriminative ability to capture the characteristics of ACPs. In addition, the information from different views is complementary for the performance improvement. Finally, our benchmarking comparison results showed that the proposed ACPred-Fuse is more precise and promising in the identification of ACPs than existing predictors. To facilitate the use of the proposed predictor, we built a web server, which is now freely available via http://server.malab.cn/ACPred-Fuse.

[1]  Q. Zou,et al.  Cancer Diagnosis Through IsomiR Expression with Machine Learning Method , 2016 .

[2]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Man Wu,et al.  A genome-wide analysis of the small auxin-up RNA (SAUR) gene family in cotton , 2017, BMC Genomics.

[4]  Quan Zou,et al.  ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. , 2019, Journal of proteome research.

[5]  Q. Zou,et al.  SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides , 2017, BMC Genomics.

[6]  Xiangxiang Zeng,et al.  nDNA-prot: identification of DNA-binding proteins based on unbalanced classification , 2014, BMC Bioinformatics.

[7]  C. Mathers,et al.  Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008 , 2010, International journal of cancer.

[8]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[9]  Ran Su,et al.  CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning , 2018, Briefings Bioinform..

[10]  M. Castanho,et al.  From antimicrobial to anticancer peptides. A review , 2013, Front. Microbiol..

[11]  Kumardeep Chaudhary,et al.  In Silico Models for Designing and Discovering Novel Anticancer Peptides , 2013, Scientific Reports.

[12]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[13]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[14]  Ying Ju,et al.  Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology , 2016, International journal of genomics.

[15]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[16]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  Gajendra P. S. Raghava,et al.  CancerPPD: a database of anticancer peptides and proteins , 2014, Nucleic Acids Res..

[19]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[20]  Saravanan Vijayakumar,et al.  ACPP: A Web Server for Prediction and Design of Anti-cancer Peptides , 2014, International Journal of Peptide Research and Therapeutics.

[21]  Bo Yao,et al.  PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine , 2014, Amino Acids.

[22]  Quan Zou,et al.  Exploratory Predicting Protein Folding Model with Random Forest and Hybrid Features , 2014 .

[23]  Achuthsankar S. Nair,et al.  Composition, Transition and Distribution (CTD) — A dynamic feature for predictions based on hierarchical structure of cellular sorting , 2011, 2011 Annual IEEE India Conference.

[24]  D. Hoskin,et al.  Cationic antimicrobial peptides as novel cytotoxic agents for cancer treatment , 2006, Expert opinion on investigational drugs.

[25]  S. Wold,et al.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. , 1998, Journal of medicinal chemistry.

[26]  Gajendra P S Raghava,et al.  Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition* , 2004, Journal of Biological Chemistry.

[27]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[30]  Balachandran Manavalan,et al.  MLACP: machine-learning-based prediction of anticancer peptides , 2017, Oncotarget.

[31]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..

[32]  Xiangxiang Zeng,et al.  Spiking Neural P Systems With Colored Spikes , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[33]  Jihong Guan,et al.  Group-sparse Modeling Drug-kinase Networks for Predicting Combinatorial Drug Sensitivity in Cancer Cells , 2018, Current Bioinformatics.

[34]  Xiangrong Liu,et al.  On String Languages Generated by Spiking Neural P Systems With Structural Plasticity , 2018, IEEE Transactions on NanoBioscience.

[35]  Ran Su,et al.  PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning , 2019, Bioinform..

[36]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[37]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[38]  Hidde L. Ploegh,et al.  Site-Specific N- and C-Terminal Labeling of a Single Polypeptide Using Sortases of Different Specificity , 2009, Journal of the American Chemical Society.

[39]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[40]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[41]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[42]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[43]  Vijayakumar Saravanan,et al.  Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor. , 2015, Omics : a journal of integrative biology.

[44]  D. Hoskin,et al.  Studies on anticancer activities of antimicrobial peptides. , 2008, Biochimica et biophysica acta.

[45]  Yibing Huang,et al.  Alpha-helical cationic anticancer peptides: a promising candidate for novel anticancer drugs. , 2015, Mini reviews in medicinal chemistry.

[46]  P. Johnston,et al.  Cancer drug resistance: an evolving paradigm , 2013, Nature Reviews Cancer.

[47]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[48]  Shaoliang Peng,et al.  Bioinformatics applications on Apache Spark , 2018, GigaScience.

[49]  F Schoonjans,et al.  MedCalc: a new computer program for medical statistics. , 1995, Computer methods and programs in biomedicine.

[50]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[51]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[52]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[53]  Leyi Wei,et al.  A novel hierarchical selective ensemble classifier with bioinformatics application , 2017, Artif. Intell. Medicine.

[54]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[55]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..