A deep learning method for predicting the minimum inhibitory concentration of antimicrobial peptides against Escherichia coli using Multi-Branch-CNN and Attention

ABSTRACT Antimicrobial peptides (AMPs) are a promising alternative to antibiotics to combat drug resistance in pathogenic bacteria. However, the development of AMPs with high potency and specificity remains a challenge, and new tools to evaluate antimicrobial activity are needed to accelerate the discovery process. Therefore, we proposed MBC-Attention, a combination of a multi-branch convolution neural network architecture and attention mechanisms to predict the experimental minimum inhibitory concentration of peptides against Escherichia coli. The optimal MBC-Attention model achieved an average Pearson correlation coefficient (PCC) of 0.775 and a root mean squared error (RMSE) of 0.533 (log μM) in three independent tests of randomly drawn sequences from the data set. This results in a 5–12% improvement in PCC and a 6–13% improvement in RMSE compared to 17 traditional machine learning models and 2 optimally tuned models using random forest and support vector machine. Ablation studies confirmed that the two proposed attention mechanisms, global attention and local attention, contributed largely to performance improvement. IMPORTANCE Antimicrobial peptides (AMPs) are potential candidates for replacing conventional antibiotics to combat drug resistance in pathogenic bacteria. Therefore, it is necessary to evaluate the antimicrobial activity of AMPs quantitatively. However, wet-lab experiments are labor-intensive and time-consuming. To accelerate the evaluation process, we develop a deep learning method called MBC-Attention to regress the experimental minimum inhibitory concentration of AMPs against Escherichia coli. The proposed model outperforms traditional machine learning methods. Data, scripts to reproduce experiments, and the final production models are available on GitHub. Antimicrobial peptides (AMPs) are potential candidates for replacing conventional antibiotics to combat drug resistance in pathogenic bacteria. Therefore, it is necessary to evaluate the antimicrobial activity of AMPs quantitatively. However, wet-lab experiments are labor-intensive and time-consuming. To accelerate the evaluation process, we develop a deep learning method called MBC-Attention to regress the experimental minimum inhibitory concentration of AMPs against Escherichia coli. The proposed model outperforms traditional machine learning methods. Data, scripts to reproduce experiments, and the final production models are available on GitHub.

[1]  A. Carvalho,et al.  Online Extra Trees Regressor. , 2022, IEEE transactions on neural networks and learning systems.

[2]  Bob Zhang,et al.  Recent Progress in the Discovery and Design of Antimicrobial Peptides Using Traditional Machine Learning and Deep Learning , 2022, Antibiotics.

[3]  H. Kwok,et al.  Multi-Branch-CNN: Classification of ion channel interacting peptides using multi-branch convolutional neural network , 2022, Comput. Biol. Medicine.

[4]  M. Yousef,et al.  Prediction of Linear Cationic Antimicrobial Peptides Active against Gram-Negative and Gram-Positive Bacteria Based on Machine Learning Models , 2022, Applied Sciences.

[5]  Scott A. Walper,et al.  PepVAE: Variational Autoencoder Framework for Antimicrobial Peptide Generation and Activity Prediction , 2021, bioRxiv.

[6]  Shirley W. I. Siu,et al.  xDeep-AcPEP: Deep Learning Method for Anticancer Peptide Activity Prediction Based on Convolutional Neural Network and Multitask Learning , 2021, J. Chem. Inf. Model..

[7]  Alex Rosenthal,et al.  DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics , 2020, Nucleic Acids Res..

[8]  Q. Kong,et al.  Antimicrobial Peptides: Classification, Design, Application and Research Progress in Multiple Fields , 2020, Frontiers in Microbiology.

[9]  Lifeng Wu,et al.  Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data , 2019, Agricultural Water Management.

[10]  Zhi-Ye Zhang,et al.  Antimicrobial peptides: new hope in the war against multidrug resistance , 2019, Zoological research.

[11]  J. Ranstam,et al.  LASSO regression , 2018, The British journal of surgery.

[12]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[13]  Andrei Gabrielian,et al.  Predictive Model of Linear Antimicrobial Peptides Active against Gram-Negative Bacteria , 2018, J. Chem. Inf. Model..

[14]  N. Chandra,et al.  Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria , 2017, The Journal of Biological Chemistry.

[15]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[16]  S. Gorr,et al.  Antimicrobial Peptides: Mechanisms of Action and Resistance , 2017, Journal of dental research.

[17]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[18]  Vijayakumar Saravanan,et al.  Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor. , 2015, Omics : a journal of integrative biology.

[19]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[20]  A. Bahar,et al.  Antimicrobial Peptides , 2013, Pharmaceuticals.

[21]  Jiangning Song,et al.  hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. , 2013, Biochimica et biophysica acta.

[22]  Xin Yan,et al.  Linear regression , 2012 .

[23]  Jian Wang,et al.  Generalized Orthogonal Matching Pursuit , 2011, IEEE Transactions on Signal Processing.

[24]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[25]  Vladimir Vovk,et al.  Empirical Inference , 2011, Springer Berlin Heidelberg.

[26]  Qian-zhong Li,et al.  Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids , 2010, Amino Acids.

[27]  Tim Hesterberg,et al.  Least Angle Regression and LASSO for Large Datasets , 2009, Stat. Anal. Data Min..

[28]  G. K. Ananthasuresh,et al.  An amino acid map of inter-residue contact energies using metric multi-dimensional scaling. , 2008, Journal of theoretical biology.

[29]  Xiaotong Shen,et al.  Prediction and Discovery , 2007 .

[30]  Edward Susko,et al.  On reduced amino acid alphabets for phylogenetic inference. , 2007, Molecular biology and evolution.

[31]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[32]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[33]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[34]  Yujie Cai,et al.  The influence of dipeptide composition on protein thermostability , 2004, FEBS letters.

[35]  B. Turlach DISCUSSION OF "LEAST ANGLE REGRESSION" BY EFRON ET AL. , 2004, math/0406472.

[36]  Gajendra P S Raghava,et al.  Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition* , 2004, Journal of Biological Chemistry.

[37]  Nick Goldman,et al.  A new criterion and method for amino acid classification. , 2004, Journal of theoretical biology.

[38]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[39]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[40]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[41]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[42]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[43]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[44]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[45]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[46]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[47]  Guangpeng Li,et al.  PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition , 2017, Bioinform..

[48]  Robert E W Hancock,et al.  Antimicrobial Peptides: An Introduction. , 2017, Methods in molecular biology.

[49]  P. Hansen Antimicrobial Peptides , 2017, Methods in Molecular Biology.

[50]  Vladimir Vovk,et al.  Kernel Ridge Regression , 2013, Empirical Inference.

[51]  Hasan Ogul,et al.  A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets , 2007, Biosyst..

[52]  R. Sokal,et al.  Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population. , 2006, American journal of physical anthropology.

[53]  A. Owen A robust hybrid of lasso and ridge regression , 2006 .

[54]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[55]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .