BiosyntheticSPAdes: reconstructing biosynthetic gene clusters from assembly graphs

Predicting Biosynthetic Gene Clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGC in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to predict BGCs spanning multiple contigs, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics datasets.

[1]  Peter Cimermancic,et al.  A Systematic Analysis of Biosynthetic Gene Clusters in the Human Microbiome Reveals a Common Family of Antibiotics , 2014, Cell.

[2]  Zemin Zhang,et al.  A profile hidden Markov model for signal peptides generated by HMMER , 2003, Bioinform..

[3]  Rekha Seshadri,et al.  Complete genome sequence of the plant commensal Pseudomonas fluorescens Pf-5 , 2005, Nature Biotechnology.

[4]  Christian Rinke,et al.  An environmental bacterial taxon with a large and distinct metabolic repertoire , 2014, Nature.

[5]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[6]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[7]  Pieter C. Dorrestein,et al.  A mass spectrometry-guided genome mining approach for natural product peptidogenomics , 2011, Nature chemical biology.

[8]  Pavel A. Pevzner,et al.  NRPquest: Coupling Mass Spectrometry and Genome Mining for Nonribosomal Peptide Discovery , 2014, Journal of natural products.

[9]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[10]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[11]  Natalia N. Ivanova,et al.  GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes , 2010, Nature Methods.

[12]  C. S. Orloff A fundamental problem in vehicle routing , 1974, Networks.

[13]  T. Stachelhaus,et al.  The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases. , 1999, Chemistry & biology.

[14]  C. Currie,et al.  Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation , 2012, BMC Genomics.

[15]  Mark Borodovsky,et al.  GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses , 2005, Nucleic Acids Res..

[16]  Hosein Mohimani,et al.  Increased diversity of peptidic natural products revealed by modification-tolerant database search of mass spectra , 2018, Nature Microbiology.

[17]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[18]  Victor M. Markowitz,et al.  IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites , 2015, mBio.

[19]  Alla Lapidus,et al.  ExSPAnder: a universal repeat resolver for DNA fragment assembly , 2014, Bioinform..

[20]  Natalia N. Ivanova,et al.  1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life , 2017, Nature Biotechnology.

[21]  Nuno Bandeira,et al.  Automated Genome Mining of Ribosomal Peptide Natural Products , 2014, ACS chemical biology.

[22]  David J Newman,et al.  Natural products: a continuing source of novel drug leads. , 2013, Biochimica et biophysica acta.

[23]  S. Brady,et al.  Culture-independent discovery of natural products from soil metagenomes , 2016, Journal of Industrial Microbiology & Biotechnology.

[24]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[25]  Renzo Kottmann,et al.  The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters , 2016, Nucleic Acids Res..

[26]  William H Gerwick,et al.  Structure and biosynthesis of the jamaicamides, new mixed polyketide-peptide neurotoxins from the marine cyanobacterium Lyngbya majuscula. , 2004, Chemistry & biology.

[27]  D. Newman,et al.  Natural Products as Sources of New Drugs from 1981 to 2014. , 2016, Journal of natural products.

[28]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[29]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[30]  Eric E. Allen,et al.  Characterization of Cyanobacterial Hydrocarbon Composition and Distribution of Biosynthetic Pathways , 2014, PloS one.

[31]  Rainer Breitling,et al.  Pep2Path: Automated Mass Spectrometry-Guided Genome Mining of Peptidic Natural Products , 2014, PLoS Comput. Biol..

[32]  Jörn Piel,et al.  Metagenome Mining Reveals Polytheonamides as Posttranslationally Modified Ribosomal Peptides , 2012, Science.

[33]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[34]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[35]  Pieter C Dorrestein,et al.  Combining Mass Spectrometric Metabolic Profiling with Genomic Analysis: A Powerful Approach for Discovering Natural Products from Cyanobacteria. , 2015, Journal of natural products.

[36]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[37]  Colin Berry,et al.  Bacillus thuringiensis Toxins: An Overview of Their Biocidal Activity , 2014, Toxins.

[38]  Andrej Sali,et al.  A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis , 2014, PLoS Comput. Biol..

[39]  J. Korlach,et al.  Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing , 2016, mBio.

[40]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[41]  Hosein Mohimani,et al.  Dereplication, sequencing and identification of peptidic natural products: from genome mining to peptidogenomics to spectral networks. , 2016, Natural product reports.

[42]  W. Gerwick,et al.  Structure and absolute stereochemistry of hectochlorin, a potent stimulator of actin assembly. , 2002, Journal of natural products.

[43]  J. Robinson Polyketide synthase complexes: their structure and function in antibiotic biosynthesis. , 1991, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[44]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[45]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[46]  P. B. Pope,et al.  Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data , 2015, Scientific Reports.

[47]  G. Challis,et al.  Coelichelin, a new peptide siderophore encoded by the Streptomyces coelicolor genome: structure prediction from the sequence of its non-ribosomal peptide synthetase. , 2000, FEMS microbiology letters.

[48]  C. Walsh,et al.  The parallel and convergent universes of polyketide synthases and nonribosomal peptide synthetases. , 1999, Chemistry & biology.

[49]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[50]  M. Fischbach,et al.  Small molecules from the human microbiota , 2015, Science.

[51]  Neha Garg,et al.  Dereplication of peptidic natural products through database search of mass spectra , 2016, Nature chemical biology.