PDAUG - a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling

Computational methods based on initial screening and prediction of peptides for desired functions have been proven effective alternatives to the lengthy and expensive methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries and the lack of access to computational resources and flexible pipelines are big hurdles to adopting these advanced methods. To address these barriers, we have implemented the Peptide Design and Analysis Under Galaxy (PDAUG) package, a Galaxy based python powered collection of tools, workflows, and datasets for a rapid in-silico peptide library analysis. PDAUG offers tools for peptide library generation, data visualization, in-built and public database based peptide sequence retrieval, peptide feature calculation, and machine learning modeling. In contrast to the existing methods like standard programming libraries or rigid web-based tools, PDAUG offers a GUI based toolset thus providing flexibility to build and distribute reproducible pipelines and workflows without programming expertise. Additionally, this toolset facilitates researchers to combine PDAUG with hundreds of compatible existing Galaxy tools for limitless analytic strategies. Finally, we demonstrate the usability of PDAUG on predicting anticancer properties of peptides using four different feature sets and assess the suitability of various machine learning algorithms.

[1]  Kam Y. J. Zhang,et al.  Design of a peptide-based subunit vaccine against novel coronavirus SARS-CoV-2 , 2020, Microbial Pathogenesis.

[2]  J. Lai,et al.  Peptide-Based Vaccines: Current Progress and Future Challenges , 2019, Chemical reviews.

[3]  Jing Chen,et al.  Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method , 2019, BMC Bioinformatics.

[4]  Rui Gao,et al.  PTPD: predicting therapeutic peptides by deep learning and word2vec , 2019, BMC Bioinformatics.

[5]  Andy Chi-Lung Lee,et al.  A Comprehensive Review on Current Advances in Peptide Drug Development and Design , 2019, International journal of molecular sciences.

[6]  Virapong Prachayasittikul,et al.  ACPred: A Computational Tool for the Prediction and Analysis of Anticancer Peptides , 2019, Molecules.

[7]  J. Venter,et al.  Functional characterization of 3D protein structures informed by human genetic diversity , 2019, Proceedings of the National Academy of Sciences.

[8]  F. Forti,et al.  Intracellular Peptides in Cell Biology and Pharmacology , 2019, Biomolecules.

[9]  F. Gallou,et al.  Sustainability Challenges in Peptide Synthesis and Purification: From R&D to Production. , 2019, The Journal of organic chemistry.

[10]  M. U. Mirza,et al.  Antigenic Peptide Prediction From E6 and E7 Oncoproteins of HPV Types 16 and 18 for Therapeutic Vaccine Design Using Immunoinformatics and MD Simulation Analysis , 2018, Front. Immunol..

[11]  Zachary Wu,et al.  Learned protein embeddings for machine learning , 2018, Bioinform..

[12]  Daniel J. Blankenberg,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update , 2018, Nucleic Acids Res..

[13]  Amr T. M. Saeb Current Bioinformatics resources in combating infectious diseases , 2018, Bioinformation.

[14]  Iddo Friedberg,et al.  Identifying antimicrobial peptides using word embedding with deep recurrent neural networks , 2018, bioRxiv.

[15]  Simon Fong,et al.  AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest , 2018, Scientific Reports.

[16]  Gisbert Schneider,et al.  modlAMP: Python for antimicrobial peptides , 2017, Bioinform..

[17]  Bogumil Konopka,et al.  Quantiprot - a Python package for quantitative analysis of protein sequences , 2017, BMC Bioinformatics.

[18]  Xuhua Xia,et al.  Bioinformatics and Drug Discovery , 2017, Current topics in medicinal chemistry.

[19]  Prabina Kumar Meher,et al.  Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC , 2017, Scientific Reports.

[20]  Xia Li,et al.  APD3: the antimicrobial peptide database as a tool for research and education , 2015, Nucleic Acids Res..

[21]  Faiza Hanif Waghu,et al.  CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides , 2015, Nucleic Acids Res..

[22]  L. Otvos,et al.  Current challenges in peptide-based drug discovery , 2014, Front. Chem..

[23]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[24]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[25]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[26]  Dong-Sheng Cao,et al.  PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies , 2013, J. Chem. Inf. Model..

[27]  A Lavecchia,et al.  Virtual screening strategies in drug discovery: a critical review. , 2013, Current medicinal chemistry.

[28]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  P. Oyston,et al.  The current challenges for vaccine development. , 2012, Journal of medical microbiology.

[30]  Xiangdong Wang,et al.  Cancer bioinformatics: A new approach to systems clinical medicine , 2012, BMC Bioinformatics.

[31]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[32]  E. Nguifo,et al.  Protein sequences classification by means of feature extraction with substitution matrices , 2010, BMC Bioinformatics.

[33]  Gajendra P. S. Raghava,et al.  AntiBP2: improved version of antibacterial peptide prediction , 2010, BMC Bioinformatics.

[34]  Bijan Ranjbar,et al.  Circular Dichroism Techniques: Biomolecular and Nanostructural Analyses‐ A Review , 2009, Chemical biology & drug design.

[35]  A. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[36]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[37]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[38]  S. Rhee Bioinformatics. Current Limitations and Insights for the Future , 2005, Plant Physiology.

[39]  W. Forssmann,et al.  Exploiting natural peptide diversity: novel research tools and drug leads. , 2004, Current opinion in biotechnology.

[40]  Wullianallur Raghupathi,et al.  Critical issues in bioinformatics and computing. , 2004, Perspectives in health information management.

[41]  Don Gilbert,et al.  Bioinformatics software resources. , 2004, Briefings in bioinformatics.

[42]  B. Bray Large-scale manufacture of peptide therapeutics by chemical synthesis , 2003, Nature Reviews Drug Discovery.

[43]  V. Uversky Natively unfolded proteins: A point where biology waits for physics , 2002, Protein science : a publication of the Protein Society.

[44]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[45]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[46]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Stephanie Koch,et al.  Bioinformatics and Drug Discovery , 2019, Methods in Molecular Biology.

[48]  T. Hoffmann,et al.  Peptide therapeutics: current status and future directions. , 2015, Drug discovery today.