Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms

Quorum-sensing peptides (QSPs) are the signal molecules that are closely associated with diverse cellular processes, such as cell-cell communication, and gene expression regulation in Gram-positive bacteria. It is therefore of great importance to identify QSPs for better understanding and in-depth revealing of their functional mechanisms in physiological processes. Machine learning algorithms have been developed for this purpose, showing the great potential for the reliable prediction of QSPs. In this study, several sequence-based feature descriptors for peptide representation and machine learning algorithms are comprehensively reviewed, evaluated and compared. To effectively use existing feature descriptors, we used a feature representation learning strategy that automatically learns the most discriminative features from existing feature descriptors in a supervised way. Our results demonstrate that this strategy is capable of effectively capturing the sequence determinants to represent the characteristics of QSPs, thereby contributing to the improved predictive performance. Furthermore, wrapping this feature representation learning strategy, we developed a powerful predictor named QSPred-FL for the detection of QSPs in large-scale proteomic data. Benchmarking results with 10-fold cross validation showed that QSPred-FL is able to achieve better performance as compared to the state-of-the-art predictors. In addition, we have established a user-friendly webserver that implements QSPred-FL, which is currently available at http://server.malab.cn/QSPred-FL. We expect that this tool will be useful for the high-throughput prediction of QSPs and the discovery of important functional mechanisms of QSPs.

[1]  Zhao Li,et al.  Identification of Protein-Protein Interactions by Detecting Correlated Mutation at the Interface , 2015, J. Chem. Inf. Model..

[2]  Kuo-Chen Chou,et al.  iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC , 2018, International journal of biological sciences.

[3]  Xiangxiang Zeng,et al.  Prediction of potential disease-associated microRNAs using structural perturbation method , 2017, bioRxiv.

[4]  Dong Wang,et al.  iLoc‐lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC , 2018, Bioinform..

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Xiaobo Zhou,et al.  Integrated transcriptome and epigenome analyses identify alternative splicing as a novel candidate linking histone modifications to embryonic stem cell fate decision , 2017, bioRxiv.

[7]  Lusheng Wang,et al.  Probabilistic Models for Capturing More Physicochemical Properties on Protein-Protein Interface , 2014, J. Chem. Inf. Model..

[8]  Dariusz Mrozek,et al.  Scalable Data Mining Algorithms in Computational Biology and Biomedicine , 2017, BioMed research international.

[9]  Xiangxiang Zeng,et al.  An Evolutionary Algorithm Based on Minkowski Distance for Many-Objective Optimization , 2019, IEEE Transactions on Cybernetics.

[10]  Bingqiang Liu,et al.  An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes , 2016, BMC Genomics.

[11]  Xing-Ming Zhao,et al.  Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets , 2014, Bioinform..

[12]  Xingpeng Jiang,et al.  Sequence clustering in bioinformatics: an empirical study. , 2018, Briefings in bioinformatics.

[13]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Geoffrey I. Webb,et al.  iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites , 2018, Briefings Bioinform..

[16]  Bo Yao,et al.  PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine , 2014, Amino Acids.

[17]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[18]  Zhao Li,et al.  Identification of 14-3-3 Proteins Phosphopeptide-Binding Specificity Using an Affinity-Based Computational Approach , 2016, PloS one.

[19]  Gwang Lee,et al.  PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine , 2018, Front. Microbiol..

[20]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[21]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[24]  Ying Ju,et al.  Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest , 2016, Scientifica.

[25]  Tie Qiu,et al.  Recurrent Broad Learning Systems for Time Series Prediction , 2020, IEEE Transactions on Cybernetics.

[26]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[27]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[28]  Jijun Tang,et al.  Predicting protein-protein interactions via multivariate mutual information of protein sequences , 2016, BMC Bioinformatics.

[29]  Ying Xu,et al.  A new framework for identifying cis-regulatory motifs in prokaryotes , 2010, Nucleic acids research.

[30]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[31]  Q. Zou,et al.  SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides , 2017, BMC Genomics.

[32]  Michiel Kleerebezem,et al.  Quorum sensing by peptide pheromones and two‐component signal‐transduction systems in Gram‐positive bacteria , 1997, Molecular microbiology.

[33]  Xiaolong Wang,et al.  A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction , 2019, Briefings Bioinform..

[34]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[35]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..

[36]  Jijun Tang,et al.  Identification of drug-target interactions via multiple information integration , 2017, Inf. Sci..

[37]  B. Liu,et al.  An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier , 2013, BioMed research international.

[38]  Balachandran Manavalan,et al.  MLACP: machine-learning-based prediction of anticancer peptides , 2017, Oncotarget.

[39]  Lusheng Wang,et al.  Protein-Protein Binding Sites Prediction by 3D Structural Similarities , 2011, J. Chem. Inf. Model..

[40]  Jooyoung Lee,et al.  SVMQA: support‐vector‐machine‐based protein single‐model quality assessment , 2017, Bioinform..

[41]  Xiaobo Zhou,et al.  Deep learning of the splicing (epi)genetic code reveals a novel candidate mechanism linking histone modifications to ESC fate decision , 2017, bioRxiv.

[42]  Evelien Wynendaele,et al.  Quorumpeps database: chemical space, microbial origin and functionality of quorum sensing peptides , 2012, Nucleic Acids Res..

[43]  H. Westerhoff,et al.  Predictable Irreversible Switching Between Acute and Chronic Inflammation , 2018, Front. Immunol..

[44]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[45]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[46]  Myeong Ok Kim,et al.  PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions , 2018, Front. Immunol..

[47]  E. Greenberg,et al.  Quinolone signaling in the cell-to-cell communication system of Pseudomonas aeruginosa. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[48]  K. Nealson,et al.  Cellular Control of the Synthesis and Activity of the Bacterial Luminescent System , 1970, Journal of bacteriology.

[49]  Martin H. Dawson,et al.  IN VITRO TRANSFORMATION OF PNEUMOCOCCAL TYPES : I. A TECHNIQUE FOR INDUCING TRANSFORMATION OF PNEUMOCOCCAL TYPES IN VITRO. , 1931 .

[50]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[51]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[52]  Lusheng Wang,et al.  Protein-protein binding site identification by enumerating the configurations , 2012, BMC Bioinformatics.

[53]  Geoffrey I. Webb,et al.  Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features , 2014, Scientific Reports.

[54]  B. Bassler,et al.  Structural identification of a bacterial quorum-sensing signal containing boron , 2002, Nature.

[55]  Xin Chen,et al.  DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses , 2017, Bioinform..

[56]  Kumardeep Chaudhary,et al.  Cell Penetrating Peptides , 2016 .

[57]  Xiangxiang Zeng,et al.  MOEA/HD: A Multiobjective Evolutionary Algorithm Based on Hierarchical Decomposition , 2019, IEEE Transactions on Cybernetics.

[58]  B. Bassler,et al.  Quorum sensing: cell-to-cell communication in bacteria. , 2005, Annual review of cell and developmental biology.

[59]  Ying Xu,et al.  Computational analyses of transcriptomic data reveal the dynamic organization of the Escherichia coli chromosome under different conditions , 2013, Nucleic acids research.

[60]  Marc Torrent,et al.  Connecting Peptide Physicochemical and Antimicrobial Properties by a Rational Prediction Model , 2011, PloS one.

[61]  Ying Xu,et al.  An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale , 2013, Bioinform..

[62]  De-Shuang Huang,et al.  iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC , 2018, Bioinform..

[63]  Achuthsankar S. Nair,et al.  Composition, Transition and Distribution (CTD) — A dynamic feature for predictions based on hierarchical structure of cellular sorting , 2011, 2011 Annual IEEE India Conference.

[64]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[65]  Balachandran Manavalan,et al.  Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy. , 2018, Journal of proteome research.

[66]  Wei Chen,et al.  iRNA-2OM: A Sequence-Based Predictor for Identifying 2′-O-Methylation Sites in Homo sapiens , 2018, J. Comput. Biol..

[67]  Tarun Mall,et al.  ProtAnnot: an App for Integrated Genome Browser to display how alternative splicing and transcription affect proteins , 2016, Bioinform..

[68]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[69]  Gary M. Dunny,et al.  Cell-cell signaling in bacteria , 1999 .

[70]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Geoffrey I. Webb,et al.  GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome , 2015, Bioinform..

[72]  Qin Ma,et al.  Global Genomic Arrangement of Bacterial Genes Is Closely Tied with the Total Transcriptional Efficiency , 2013, Genom. Proteom. Bioinform..

[73]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[74]  Manoj Kumar,et al.  Prediction and Analysis of Quorum Sensing Peptides Based on Sequence Features , 2015, PloS one.

[75]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[76]  B. Bassler How bacteria talk to each other: regulation of gene expression by quorum sensing. , 1999, Current opinion in microbiology.

[77]  Myeong Ok Kim,et al.  iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction , 2018, Front. Immunol..

[78]  Balachandran Manavalan,et al.  DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest , 2017, bioRxiv.

[79]  Gholamreza Haffari,et al.  PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. , 2018, Journal of theoretical biology.

[80]  Quan Zou,et al.  Exploratory Predicting Protein Folding Model with Random Forest and Hybrid Features , 2014 .

[81]  Lei Chen,et al.  Machine learning and graph analytics in computational biomedicine , 2017, Artif. Intell. Medicine.

[82]  B. Bassler,et al.  Quorum sensing in bacteria. , 2001, Annual review of microbiology.

[83]  Junjie Chen,et al.  A comprehensive review and comparison of different computational methods for protein remote homology detection , 2018, Briefings Bioinform..

[84]  E. Greenberg,et al.  Quorum sensing in bacteria: the LuxR-LuxI family of cell density-responsive transcriptional regulators , 1994, Journal of bacteriology.