The PepSeq Pipeline: Software for Antimicrobial Motif Discovery in Randomly-Generated Peptide Libraries

Bacteria with resistance genes are becoming ever more common, and new methods of discovering antibiotics are being developed. One of these new methods involves researchers creating random peptides and testing their antimicrobial activity. Developing antibiotics from these peptides requires understanding which sequence motifs will be toxic to bacteria. To determine if the toxic peptides of a randomly-generated peptide library can be uniquely classified based solely on sequence motifs, we created the PepSeq Pipeline: a new software that utilizes a Random Forest algorithm to extract motifs from a peptide library. We found that this pipeline can accurately classify 56% of the toxic peptides in the peptide library using motifs extracted from the model. Testing on simulated data with less noise, we could classify up to 94% of the toxic peptides. The pipeline extracted significant toxic motifs in every library that was tested, but its ability to classify all toxic peptides depended on the number of motifs in the library. Once extracted, these motifs can be used both to understand the biology behind why certain peptides are toxic and to create novel antibiotics. The code and data used in this analysis can be found at https://github.com/tjense25/pep-seq-pipeline.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Alexios Koutsoukas,et al.  Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data , 2017, Journal of Cheminformatics.

[3]  Andreas Zell,et al.  SPSO: Synthetic Protein Sequence Oversampling for Imbalanced Protein Data and Remote Homology Detection , 2006, ISBMDA.

[4]  Kimito Funatsu,et al.  Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space , 2017, Molecular informatics.

[5]  Thomas Blaschke,et al.  Application of Generative Autoencoder in De Novo Molecular Design , 2017, Molecular informatics.

[6]  T. Ganz Defensins: antimicrobial peptides of innate immunity , 2003, Nature Reviews Immunology.

[7]  Alan Wee-Chung Liew,et al.  Unbinding of Kinesin from Microtubule in the Strongly Bound States Enhances under Assisting Forces , 2018, Molecular informatics.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[10]  Francy Liliana Camacho,et al.  Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides , 2016 .

[11]  Rodrigo Torres,et al.  Machine learning in the rational design of antimicrobial peptides. , 2014, Current computer-aided drug design.

[12]  Stephan Harbarth,et al.  Will 10 Million People Die a Year due to Antimicrobial Resistance by 2050? , 2016, PLoS medicine.

[13]  Woody Sherman,et al.  AutoQSAR: an automated machine learning tool for best-practice quantitative structure-activity relationship modeling. , 2016, Future medicinal chemistry.