PSBP-SVM: A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides

Polystyrene binding peptides (PSBPs) play a key role in the immobilization process. The correct identification of PSBPs is the first step of all related works. In this paper, we proposed a novel support vector machine-based bioinformatic identification model. This model contains four machine learning steps, including feature extraction, feature selection, model training and optimization. In a five-fold cross validation test, this model achieves 90.38, 84.62, 87.50, and 0.90% SN, SP, ACC, and AUC, respectively. The performance of this model outperforms the state-of-the-art identifier in terms of the SN and ACC with a smaller feature set. Furthermore, we constructed a web server that includes the proposed model, which is freely accessible at http://server.malab.cn/PSBP-SVM/index.jsp.

[1]  Kei-Hoi Cheung,et al.  Bringing Web 2.0 to bioinformatics , 2008, Briefings Bioinform..

[2]  Fei Guo,et al.  AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine , 2019, Front. Bioeng. Biotechnol..

[3]  T. Katagiri,et al.  Cancer Diagnosis , 1992, Springer Berlin Heidelberg.

[4]  Xiangxiang Zeng,et al.  Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods , 2020, Briefings Bioinform..

[5]  Janet M. Thornton,et al.  Protein fold recognition , 1993, J. Comput. Aided Mol. Des..

[6]  Feng Zhu,et al.  A novel bioinformatics approach to identify the consistently well-performing normalization strategy for current metabolomic studies , 2019, Briefings Bioinform..

[7]  Dariusz Mrozek,et al.  An Improved Method for Protein Similarity Searching by Alignment of Fuzzy Energy Signatures , 2011, Int. J. Comput. Intell. Syst..

[8]  Quan Zou,et al.  Incorporating Distance-based Top-n-gram and Random Forest to Identify Electron Transport Proteins. , 2019, Journal of proteome research.

[9]  Q. Zou,et al.  A novel machine learning method for cytokine-receptor interaction prediction. , 2016, Combinatorial chemistry & high throughput screening.

[10]  Feng Zhu,et al.  Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery , 2019, Briefings Bioinform..

[11]  Tingting Fu,et al.  Therapeutic target database update 2018: enriched resource for facilitating bench-to-clinic research of targeted therapeutics , 2017, Nucleic Acids Res..

[12]  Jianzhong Su,et al.  Recurrence-Associated Long Non-coding RNA Signature for Determining the Risk of Recurrence in Patients with Colon Cancer , 2018, Molecular therapy. Nucleic acids.

[13]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[14]  Guangmin Liang,et al.  k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification , 2019, Front. Genet..

[15]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[17]  Dong-Qing Wei,et al.  Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method , 2019, J. Chem. Inf. Model..

[18]  Jiangning Song,et al.  Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms , 2018, Briefings Bioinform..

[19]  Kai Li,et al.  iPromoter-2L2.0: Identifying Promoters and Their Types by Combining Smoothing Cutting Window Algorithm and Sequence-Based Features , 2019, Molecular therapy. Nucleic acids.

[20]  Alfonso Rodríguez-Patón,et al.  Meta-Path Methods for Prioritizing Candidate Disease miRNAs , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Jijun Tang,et al.  Identification of drug-target interactions via multiple information integration , 2017, Inf. Sci..

[22]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[23]  Xiangrong Liu,et al.  Computational methods for identifying the critical nodes in biological networks , 2019, Briefings Bioinform..

[24]  Guangmin Liang,et al.  SeqSVM: A Sequence-Based Support Vector Machine Method for Identifying Antioxidant Proteins , 2018, International journal of molecular sciences.

[25]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[26]  Xiaozhao Fang,et al.  Protein fold recognition based on multi-view modeling , 2019, Bioinform..

[27]  森冨 悟,et al.  Polypropylene compounds for automotive applications , 2010 .

[28]  Jijun Tang,et al.  Identification of drug-side effect association via multiple information integration with centered kernel alignment , 2019, Neurocomputing.

[29]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[30]  Nilanjan Dey,et al.  Morphological Segmentation Analysis and Texture-based Support Vector Machines Classification on Mice Liver Fibrosis Microscopic Images , 2019, Current Bioinformatics.

[31]  Jack Y. Yang,et al.  Transcription factor and microRNA regulation in androgen-dependent and -independent prostate cancer cells , 2008, BMC Genomics.

[32]  Wei Tao,et al.  A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. , 2019, Briefings in functional genomics.

[33]  Babak Bakhshinejad,et al.  A polystyrene binding target-unrelated peptide isolated in the screening of phage display library. , 2016, Analytical biochemistry.

[34]  Hui Ding,et al.  Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem? , 2019, Molecular therapy. Nucleic acids.

[35]  Yadong Wang,et al.  MeDReaders: a database for transcription factors that bind to methylated DNA , 2017, Nucleic Acids Res..

[36]  Jijun Tang,et al.  Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. , 2019, Journal of theoretical biology.

[37]  Xiangxiang Zeng,et al.  An Evolutionary Algorithm Based on Minkowski Distance for Many-Objective Optimization , 2019, IEEE Transactions on Cybernetics.

[38]  H. Ayhan,et al.  Plasma treatment of polypropylene fabric for improved dyeability with soluble textile dyestuff , 2009 .

[39]  Xiaofeng Li,et al.  Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data , 2019, Briefings Bioinform..

[40]  D. Kihara,et al.  Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates. , 2019, Current drug metabolism.

[41]  Bo Li,et al.  NOREVA: normalization and evaluation of MS-based metabolomics data , 2017, Nucleic Acids Res..

[42]  Bin Liu,et al.  DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Feng Zhu,et al.  Simultaneous Improvement in the Precision, Accuracy, and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains* , 2019, Molecular & Cellular Proteomics.

[44]  Daisuke Kihara,et al.  Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates. , 2019, Current drug metabolism.

[45]  André Corrêa Amaral,et al.  Principles, techniques, and applications of biocatalyst immobilization for industrial application , 2015, Applied Microbiology and Biotechnology.

[46]  Gabriel del Rio,et al.  Effective Design of Multifunctional Peptides by Combining Compatible Functions , 2016, PLoS Comput. Biol..

[47]  Jun Zhang,et al.  Identifying diseases-related metabolites using random walk , 2018, BMC Bioinformatics.

[48]  Ning Li,et al.  PSBinder: A Web Service for Predicting Polystyrene Surface-Binding Peptides , 2017, BioMed research international.

[49]  Jijun Tang,et al.  Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information , 2016, International journal of molecular sciences.

[50]  Jijun Tang,et al.  Predicting protein-protein interactions via multivariate mutual information of protein sequences , 2016, BMC Bioinformatics.

[51]  Shuigeng Zhou,et al.  Predicting Enhancers from Multiple Cell Lines and Tissues across Different Developmental Stages Based On SVM Method , 2018, Current Bioinformatics.

[52]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[53]  Characterization of polystyrene-binding peptides (PS-tags) for site-specific immobilization of proteins. , 2010, Journal of bioscience and bioengineering.

[54]  Bin Liu,et al.  ProtDec-LTR3.0: Protein Remote Homology Detection by Incorporating Profile-Based Features Into Learning to Rank , 2019, IEEE Access.

[55]  Xie Junyuana Method on Entity Identification Using Similarity Measure Based on Weight of Jaccard , 2009 .

[56]  Lixia Yao,et al.  Simultaneous Improvement in the Precision, Accuracy, and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains. , 2019, Molecular & cellular proteomics : MCP.

[57]  Tao Zeng,et al.  Prediction of heme binding residues from protein sequences with integrative sequence profiles , 2012, Proteome Science.

[58]  Rui Sun,et al.  RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition , 2019, Molecular therapy. Nucleic acids.

[59]  Feng Zhu,et al.  Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs , 2019, Briefings Bioinform..

[60]  Q. Zou,et al.  Cancer Diagnosis Through IsomiR Expression with Machine Learning Method , 2016 .

[61]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[62]  Xiaofeng Li,et al.  ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies , 2019, Briefings Bioinform..

[63]  Guangmin Liang,et al.  An Efficient Classifier for Alzheimer’s Disease Genes Identification , 2018, Molecules.

[64]  G. Yen,et al.  A Consensus Community-Based Particle Swarm Optimization for Dynamic Community Detection , 2020, IEEE Transactions on Cybernetics.

[65]  Xiangxiang Zeng,et al.  MOEA/HD: A Multiobjective Evolutionary Algorithm Based on Hierarchical Decomposition , 2019, IEEE Transactions on Cybernetics.

[66]  Dariusz Mrozek,et al.  Soft and Declarative Fishing of Information in Big Data Lake , 2018, IEEE Transactions on Fuzzy Systems.

[67]  J. Butler Enzyme-Linked Immunosorbent Assay , 2000, Journal of immunoassay.

[68]  Zicheng Zhang,et al.  Computational identification of mutator-derived lncRNA signatures of genome instability for improving the clinical outcome of cancers: a case study in breast cancer , 2020, Briefings Bioinform..

[69]  Bin Liu,et al.  DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks , 2019, Briefings Bioinform..

[70]  Jianzhong Su,et al.  Analysis of long noncoding RNAs highlights region-specific altered expression patterns and diagnostic roles in Alzheimer's disease , 2019, Briefings Bioinform..

[71]  Xiangxiang Zeng,et al.  Prediction and Validation of Disease Genes Using HeteSim Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[72]  Yadong Wang,et al.  Signal Transducers and Activators of Transcription-1 (STAT1) Regulates microRNA Transcription in Interferon γ-Stimulated HeLa Cells , 2010, PloS one.

[73]  Feng Zhu,et al.  Discovery of the Consistently Well-Performed Analysis Chain for SWATH-MS Based Pharmacoproteomic Quantification , 2018, Front. Pharmacol..

[74]  Qinghua Guo,et al.  LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse , 2018, Nucleic Acids Res..

[75]  Dr Ferdiye Taner,et al.  The enzyme-linked immunosorbent assay (ELISA). , 1976, Bulletin of the World Health Organization.

[76]  Siqi Bao,et al.  Discovery and validation of immune-associated long non-coding RNA biomarkers associated with clinically molecular subtype and prognosis in diffuse large B cell lymphoma , 2017, Molecular Cancer.

[77]  Xingpeng Jiang,et al.  Sequence clustering in bioinformatics: an empirical study. , 2018, Briefings in bioinformatics.

[78]  Quan Zou,et al.  SecProMTB: Support Vector Machine‐Based Classifier for Secretory Proteins Using Imbalanced Data Sets Applied to Mycobacterium tuberculosis , 2019, Proteomics.

[79]  Joao Castanheira,et al.  FOR PREDICTING PROTEIN-PROTEIN INTERACTIONS , 2018 .

[80]  Bin Liu,et al.  Fold-LTR-TCP: protein fold recognition based on triadic closure principle , 2019, Briefings Bioinform..

[81]  Shaoping Wu,et al.  Mycobacterium tuberculosis Secreted Proteins As Potential Biomarkers for the Diagnosis of Active Tuberculosis and Latent Tuberculosis Infection , 2014, Journal of clinical laboratory analysis.

[82]  Feng Zhu,et al.  Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics , 2019, Nucleic Acids Res..

[83]  Leyi Wei,et al.  A novel hierarchical selective ensemble classifier with bioinformatics application , 2017, Artif. Intell. Medicine.

[84]  F. Wang,et al.  Methods of MicroRNA Promoter Prediction and Transcription Factor Mediated Regulatory Network , 2017, BioMed research international.

[85]  Guangmin Liang,et al.  A Novel Hybrid Sequence-Based Model for Identifying Anticancer Peptides , 2018, Genes.

[86]  Yi Xiong,et al.  DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features , 2019, Briefings Bioinform..

[87]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[88]  Yi Xiong,et al.  PseUI: Pseudouridine sites identification based on RNA sequence information , 2018, BMC Bioinformatics.

[89]  Dariusz Mrozek,et al.  Spark-IDPP: high-throughput and scalable prediction of intrinsically disordered protein regions with Spark clusters on the Cloud , 2018, Cluster Computing.

[90]  Feng Zhu,et al.  VARIDT 1.0: variability of drug transporter database , 2019, Nucleic Acids Res..