MIBiG 2.0: a repository for biosynthetic gene clusters of known function

Abstract Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.

[1]  S. Lee,et al.  antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline , 2019, Nucleic Acids Res..

[2]  Kai Blin,et al.  The antiSMASH database version 2: a comprehensive resource on secondary metabolite biosynthetic gene clusters , 2018, Nucleic Acids Res..

[3]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[4]  Falk Hildebrand,et al.  Structure and function of the global topsoil microbiome , 2018, Nature.

[5]  M. Medema,et al.  A standardized workflow for submitting data to the Minimum Information about a Biosynthetic Gene cluster (MIBiG) repository: prospects for research-based educational experiences , 2018, Standards in Genomic Sciences.

[6]  Brian C. Thomas,et al.  Novel soil bacteria possess diverse genes for secondary metabolite biosynthesis , 2018, Nature.

[7]  Tyler W. H. Backman,et al.  ClusterCAD: a computational platform for type I modular polyketide synthase design , 2017, Nucleic Acids Res..

[8]  Alexander Lex,et al.  UpSetR: an R package for the visualization of intersecting sets and their properties , 2017, bioRxiv.

[9]  I-Min A. Chen,et al.  IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes , 2016, Nucleic Acids Res..

[10]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[11]  Marnix H Medema,et al.  Comprehensive curation and analysis of fungal biosynthetic gene clusters of published natural products. , 2016, Fungal genetics and biology : FG & B.

[12]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[13]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[14]  Kyle R. Conway,et al.  ClusterMine360: a database of microbial PKS/NRPS biosynthesis , 2012, Nucleic Acids Res..