Dereplication, sequencing and identification of peptidic natural products: from genome mining to peptidogenomics to spectral networks.

Covering: 2000 to 2015. While recent breakthroughs in the discovery of peptide antibiotics and other Peptidic Natural Products (PNPs) raise a challenge for developing new algorithms for their analyses, the computational technologies for high-throughput PNP discovery are still lacking. We discuss the computational bottlenecks in analyzing PNPs and review recent advances in genome mining, peptidogenomics, and spectral networks that are now enabling the discovery of new PNPs via mass spectrometry. We further describe the connections between these advances and the new generation of software tools for PNP dereplication, de novo sequencing, and identification.

[1]  M. Gelfand,et al.  Low‐molecular‐weight post‐translationally modified microcins , 2007, Molecular microbiology.

[2]  H Shindo,et al.  Nucleic Acids , 1932, Nature.

[3]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[4]  Bradley S Moore,et al.  MS/MS-based networking and peptidogenomics guided genome mining revealed the stenothricin gene cluster in Streptomyces roseosporus , 2013, The Journal of Antibiotics.

[5]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[6]  Nuno Bandeira,et al.  Dereplication and De Novo Sequencing of Nonribosomal Peptides , 2009, Nature Methods.

[7]  R. Mortishire-Smith,et al.  Automated assignment of high‐resolution collisionally activated dissociation mass spectra using a systematic bond disconnection approach , 2005 .

[8]  Nuno Bandeira,et al.  MS/MS networking guided analysis of molecule and gene cluster families , 2013, Proceedings of the National Academy of Sciences.

[9]  P. Fraser,et al.  Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS. , 2005, Journal of experimental botany.

[10]  Sylvie Lautru,et al.  Discovery of a new peptide natural product by Streptomyces coelicolor genome mining , 2005, Nature chemical biology.

[11]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[12]  Z. Deng,et al.  ThioFinder: A Web-Based Tool for the Identification of Thiopeptide Gene Clusters in DNA Sequences , 2012, PloS one.

[13]  P. Dorrestein,et al.  Interspecies Interactions Stimulate Diversification of the Streptomyces coelicolor Secreted Metabolome , 2013, mBio.

[14]  Robert K. Boyd,et al.  Characterisation of the tyrocidine and gramicidin fractions of the tyrothricin complex from Bacillus brevis using liquid chromatography and mass spectrometry , 1992 .

[15]  Pieter C. Dorrestein,et al.  Mass spectrometry of natural products: current, emerging and future technologies. , 2014, Natural product reports.

[16]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[17]  Hosein Mohimani,et al.  Cycloquest: identification of cyclopeptides via database search of their mass spectra against genome databases. , 2011, Journal of proteome research.

[18]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[19]  I. Hoof,et al.  CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. , 2009, Journal of biotechnology.

[20]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[21]  P. G. Arnison,et al.  Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. , 2013, Natural product reports.

[22]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[23]  R. Knight,et al.  Molecular cartography of the human skin surface in 3D , 2015, Proceedings of the National Academy of Sciences.

[24]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[25]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[26]  Kiyoshi Asai,et al.  MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters through the Integration of Genome Sequencing and Transcriptome Data , 2013, PloS one.

[27]  Huan Wang,et al.  Structural investigation of ribosomally synthesized natural products by hypothetical structure enumeration and evaluation using tandem MS , 2014, Proceedings of the National Academy of Sciences.

[28]  K. Sivonen,et al.  Highly Diverse Cyanobactins in Strains of the Genus Anabaena , 2009, Applied and Environmental Microbiology.

[29]  B. Milman,et al.  Identification of toxic cyclopeptides based on mass spectral library matching , 2014 .

[30]  Pavel A. Pevzner,et al.  A new approach to evaluating statistical significance of spectral identifications. , 2013, Journal of proteome research.

[31]  Chad W. Johnston,et al.  Dereplicating nonribosomal peptides using an informatic search algorithm for natural products (iSNAP) discovery , 2012, Proceedings of the National Academy of Sciences.

[32]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[33]  P. Pevzner,et al.  Interpreting top-down mass spectra using spectral alignment. , 2008, Analytical chemistry.

[34]  J. Lederberg,et al.  TOPOLOGICAL MAPPING OF ORGANIC MOLECULES. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[35]  M. Donia,et al.  Ribosomal peptide natural products: bridging the ribosomal and nonribosomal worlds. , 2009, Natural product reports.

[36]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[37]  Mohamed A. Marahiel,et al.  Modular Peptide Synthetases Involved in Nonribosomal Peptide Synthesis. , 1997, Chemical reviews.

[38]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[39]  S. Böcker,et al.  Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules , 2010, Analytical and Bioanalytical Chemistry.

[40]  Kai Blin,et al.  Improved Lanthipeptide Detection and Prediction for antiSMASH , 2014, PloS one.

[41]  J. Zucko,et al.  ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures , 2008, Nucleic acids research.

[42]  Peter Man-Un Ung,et al.  Automated genome mining for natural products , 2009, BMC Bioinformatics.

[43]  J. Frisvad,et al.  Aggressive dereplication using UHPLC–DAD–QTOF: screening extracts for up to 3000 fungal secondary metabolites , 2014, Analytical and Bioanalytical Chemistry.

[44]  Pieter C. Dorrestein,et al.  A mass spectrometry-guided genome mining approach for natural product peptidogenomics , 2011, Nature chemical biology.

[45]  Oscar P. Kuipers,et al.  BAGEL2: mining for bacteriocins in genomic data , 2010, Nucleic Acids Res..

[46]  Jun Feng Xiao,et al.  Metabolite identification and quantitation in LC-MS/MS-based metabolomics. , 2012, Trends in analytical chemistry : TRAC.

[47]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[48]  O. Fiehn,et al.  Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. , 2015, Trends in analytical chemistry : TRAC.

[49]  W. A. van der Donk,et al.  Genome mining for ribosomally synthesized natural products. , 2011, Current opinion in chemical biology.

[50]  Nuno Bandeira,et al.  Automated Genome Mining of Ribosomal Peptide Natural Products , 2014, ACS chemical biology.

[51]  Anna Lechner,et al.  Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species. , 2015, Chemistry and Biology.

[52]  Ruedi Aebersold,et al.  Spectral Library Searching for Peptide Identification via Tandem MS , 2010, Proteome Bioinformatics.

[53]  B. Weimann,et al.  Computer-aided identification of compounds by comparison of mass spectra , 1984 .

[54]  M. Marahiel,et al.  Nonribosomal peptides: from genes to products. , 2003, Natural product reports.

[55]  Hosein Mohimani,et al.  Sequencing cyclic peptides by multistage mass spectrometry , 2011, Proteomics.

[56]  Pavel A. Pevzner,et al.  NRPquest: Coupling Mass Spectrometry and Genome Mining for Nonribosomal Peptide Discovery , 2014, Journal of natural products.

[57]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[58]  A. Harvey,et al.  The re-emergence of natural products for drug discovery in the genomics era , 2015, Nature Reviews Drug Discovery.

[59]  T. Stachelhaus,et al.  The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases. , 1999, Chemistry & biology.

[60]  D. Haft,et al.  SMURF: Genomic mapping of fungal secondary metabolite clusters. , 2010, Fungal genetics and biology : FG & B.

[61]  Susana P. Gaudêncio,et al.  Multiplex de novo sequencing of peptide antibiotics. , 2011, Journal of computational biology : a journal of computational molecular cell biology.

[62]  Lars Ridder,et al.  Automatic chemical structure annotation of an LC-MS(n) based metabolic profile from green tea. , 2013, Analytical chemistry.

[63]  Pavel A. Pevzner,et al.  Protein identification by spectral networks analysis , 2007, Proceedings of the National Academy of Sciences.

[64]  Shibu Yooseph,et al.  Meta-omics uncover temporal regulation of pathways across oral microbiome genera during in vitro sugar metabolism , 2015, The ISME Journal.

[65]  K. Lewis,et al.  A new antibiotic kills pathogens without detectable resistance , 2015, Nature.

[66]  Kai Blin,et al.  antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers , 2013, Nucleic Acids Res..

[67]  J. Crawford,et al.  The colibactin warhead crosslinks DNA , 2015, Nature chemistry.

[68]  Nuno Bandeira,et al.  Mass spectral molecular networking of living microbial colonies , 2012, Proceedings of the National Academy of Sciences.

[69]  Kiejung Park,et al.  ASMPKS: an analysis system for modular polyketide synthases , 2007, BMC Bioinformatics.

[70]  Martin Krauss,et al.  LC–high resolution MS in environmental analysis: from target screening to the identification of unknowns , 2010, Analytical and bioanalytical chemistry.

[71]  Sebastian Böcker,et al.  Computational mass spectrometry for small molecules , 2013, Journal of Cheminformatics.

[72]  Chris L. Tang,et al.  Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. , 2001, Genome research.

[73]  Christina Boucher,et al.  The Generating Function Approach for Peptide Identification in Spectral Networks , 2014, RECOMB.

[74]  Steffen Neumann,et al.  MetFusion: integration of compound identification strategies. , 2013, Journal of mass spectrometry : JMS.

[75]  P. Pevzner,et al.  Automated de novo protein sequencing of monoclonal antibodies , 2008, Nature Biotechnology.

[76]  Riadh Hammami,et al.  BACTIBASE: a new web-accessible database for bacteriocin characterization , 2007, BMC Microbiology.

[77]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[78]  Johann Gasteiger,et al.  Prediction of mass spectra from structural information , 1992, J. Chem. Inf. Comput. Sci..

[79]  Pavel A. Pevzner,et al.  Mutation-Tolerant Protein Identification by Mass Spectrometry , 2000, J. Comput. Biol..

[80]  Nuno Bandeira,et al.  Multi-spectra peptide sequencing and its applications to multistage mass spectrometry , 2008, ISMB.

[81]  Nuno Bandeira,et al.  Interkingdom metabolic transformations captured by microbial imaging mass spectrometry , 2012, Proceedings of the National Academy of Sciences.

[82]  Dmitrii V. Tchekhovskoi,et al.  The critical evaluation of a comprehensive mass spectral library , 1999, Journal of the American Society for Mass Spectrometry.

[83]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[84]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[85]  F. McLafferty,et al.  Computer Methods of Molecular Structure Elucidation from Unknown Mass Spectra , 1981 .

[86]  Christian Rinke,et al.  An environmental bacterial taxon with a large and distinct metabolic repertoire , 2014, Nature.

[87]  Dennis H. Smith,et al.  The dendral project: recent advances in computer- assisted structure elucidation , 1981 .

[88]  Gregory Kucherov,et al.  NORINE: a database of nonribosomal peptides , 2007, Nucleic Acids Res..

[89]  Oscar P. Kuipers,et al.  BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides , 2013, Nucleic Acids Res..

[90]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[91]  W. A. van der Donk,et al.  Follow the leader: the use of leader peptides to guide natural product biosynthesis. , 2010, Nature chemical biology.

[92]  Juho Rousu,et al.  Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID , 2013, Metabolites.

[93]  Neil L Kelleher,et al.  A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics , 2014, Nature chemical biology.

[94]  Tilmann Weber,et al.  Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs) , 2005, Nucleic acids research.

[95]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[96]  V. Havlíček,et al.  CycloBranch: De Novo Sequencing of Nonribosomal Peptides from Accurate Product Ion Mass Spectra , 2015, Journal of The American Society for Mass Spectrometry.

[97]  A. Broberg,et al.  Kutznerides 1-4, depsipeptides from the actinomycete Kutzneria sp. 744 inhabiting mycorrhizal roots of Picea abies seedlings. , 2006, Journal of natural products.

[98]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[99]  Gitanjali Yadav,et al.  SBSPKS: structure based sequence analysis of polyketide synthases , 2010, Nucleic Acids Res..

[100]  Eunok Paek,et al.  Fast Multi-blind Modification Search through Tandem Mass Spectrometry* , 2011, Molecular & Cellular Proteomics.

[101]  Jens Allmer,et al.  Algorithms for the de novo sequencing of peptides from tandem mass spectra , 2011, Expert review of proteomics.

[102]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[103]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[104]  Roger G. Linington,et al.  Molecular networking as a dereplication strategy. , 2013, Journal of natural products.

[105]  Dekel Tsur,et al.  Identification of post-translational modifications by blind search of mass spectra , 2005, Nature Biotechnology.

[106]  Ruedi Aebersold,et al.  Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. , 2010, Journal of proteome research.

[107]  G. Challis,et al.  Coelichelin, a new peptide siderophore encoded by the Streptomyces coelicolor genome: structure prediction from the sequence of its non-ribosomal peptide synthetase. , 2000, FEMS microbiology letters.

[108]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[109]  I. Pelczer,et al.  Precursor-centric genome-mining approach for lasso peptide discovery , 2012, Proceedings of the National Academy of Sciences.

[110]  D. Kavan,et al.  CYCLONE—A Utility for De Novo Sequencing of Microbial Cyclic Peptides , 2013, Journal of The American Society for Mass Spectrometry.

[111]  Huajun Zheng,et al.  Bacterial biosynthesis and maturation of the didemnin anti-cancer agents. , 2012, Journal of the American Chemical Society.

[112]  Tatsuya Ito,et al.  Dereplication of microbial extracts and related analytical technologies , 2014, The Journal of Antibiotics.

[113]  J. Vederas,et al.  Drug Discovery and Natural Products: End of an Era or an Endless Frontier? , 2009, Science.

[114]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[115]  Rainer Breitling,et al.  Pep2Path: Automated Mass Spectrometry-Guided Genome Mining of Peptidic Natural Products , 2014, PLoS Comput. Biol..

[116]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[117]  B. Bowen,et al.  MIDAS: a database-searching algorithm for metabolite identification in metabolomics. , 2014, Analytical chemistry.

[118]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[119]  Oliver Fiehn,et al.  Advances in structure elucidation of small molecules using mass spectrometry , 2010, Bioanalytical reviews.

[120]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[121]  Pieter C Dorrestein,et al.  Quantitative molecular networking to profile marine cyanobacterial metabolomes , 2013, The Journal of Antibiotics.