Systematic analysis of global features and model building for recognition of antimicrobial peptides

With growing bacterial resistance to antibiotics, it is becoming paramount to seek out new antibacterials. Antimicrobial peptides (AMPs) provide interesting templates for antibacterial drug research. Our understanding of what it is that confers to these peptides their antimicrobial activity is currently poor. Yet, such understanding is the first step towards modification or design of novel AMPs for treatment. Research in machine learning is beginning to focus on recognition of AMPs from non-AMPs as a means of understanding what features confer to an AMP its activity. Methods either seek new features and test them in the context of classification or measure the classification power of features provided by biologists. In this paper, we provide a rigorous evaluation of features provided by a biologist or resulting from a combination of experimental and computational research. We present a statistics-based approach to carefully measure the significance of each feature and use this knowledge to construct predictive models. We present here logistic regression models, which are capable of associating probabilities on whether a peptide is antimicrobial or not with the feature values of the peptide. We provide access to the proposed methodology through a web server. The server allows users to replicate the findings in this paper or evaluate their own features.We believe research in this direction will allow the community to make further progress and elucidate features that capture antimicrobial activity. This is an important first step towards assisting modification and/or de novo design of AMPs in the wet laboratory.

[1]  Carl T. Bergstrom,et al.  Ecological theory suggests that antimicrobial cycling will not reduce antimicrobial resistance in hospitals. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Kenneth A. De Jong,et al.  Using evolutionary computation to improve SVM classification , 2010, IEEE Congress on Evolutionary Computation.

[3]  Guangshun Wang,et al.  Antimicrobial peptides: discovery, design and novel therapeutic strategies. , 2010 .

[4]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[5]  Gajendra P. S. Raghava,et al.  AntiBP2: improved version of antibacterial peptide prediction , 2010, BMC Bioinformatics.

[6]  Vassilios Ioannidis,et al.  ExPASy: SIB bioinformatics resource portal , 2012, Nucleic Acids Res..

[7]  Gajendra P. S. Raghava,et al.  Analysis and prediction of antibacterial peptides , 2007, BMC Bioinformatics.

[8]  Daniel J Rigden,et al.  Prediction of antimicrobial peptides based on the adaptive neuro-fuzzy inference system application. , 2012, Biopolymers.

[9]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[10]  Fabiano C. Fernandes,et al.  An SVM Model Based on Physicochemical Properties to Predict Antimicrobial Activity from Protein Sequences with Cysteine Knot Motifs , 2010, BSB.

[11]  B. Manly Randomization, Bootstrap and Monte Carlo Methods in Biology , 2018 .

[12]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[13]  Joan Valls,et al.  Book review: Probability and statistics with R. María Dolores Ugarte, Ana F. Militino and Alan T Arnholt. Chapman & Hall/CRC. , 2009 .

[14]  Amarda Shehu,et al.  Physicochemical Determinants of Antimicrobial Activity , 2012 .

[15]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[16]  H. Akaike A new look at the statistical model identification , 1974 .

[17]  María Dolores Ugarte,et al.  Probability and Statistics with R , 2008 .

[18]  Michael J. Crawley,et al.  The R book , 2022 .

[19]  David Andreu,et al.  AMPA: an automated web server for prediction of protein antimicrobial regions , 2012, Bioinform..

[20]  J. Habbema,et al.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. , 2001, Journal of clinical epidemiology.

[21]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[22]  Gemma C. Garriga,et al.  Permutation Tests for Studying Classifier Performance , 2009, 2009 Ninth IEEE International Conference on Data Mining.