IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites

ABSTRACT In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMPORTANCE IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.

[1]  Jörn Piel,et al.  Metabolites from symbiotic bacteria. , 2009, Natural product reports.

[2]  K. Konstantinidis,et al.  Genomic insights that advance the species definition for prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Staunton,et al.  Polyketide biosynthesis: a millennium review. , 2001, Natural product reports.

[4]  D. Russell,et al.  A comprehensive method for extraction and quantitative analysis of sterols and secosteroids from human plasma[S] , 2012, Journal of Lipid Research.

[5]  F B ROGERS,et al.  Medical Subject Headings , 1948, Nature.

[6]  Christian Senger,et al.  StreptomeDB: a resource for natural compounds isolated from Streptomyces species , 2012, Nucleic Acids Res..

[7]  I-Min A. Chen,et al.  IMG 4 version of the integrated microbial genomes comparative analysis system , 2013, Nucleic Acids Res..

[8]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[9]  Ralf Thiericke,et al.  Drug Discovery from Nature , 2000 .

[10]  Minoru Kanehisa,et al.  Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. , 2007, Journal of molecular biology.

[11]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[12]  A. Dillmann Enzyme Nomenclature , 1965, Nature.

[13]  Nobuyuki Fujita,et al.  DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters , 2012, Nucleic Acids Res..

[14]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[15]  Yan Wang,et al.  fmcsR: mismatch tolerant maximum common substructure searching in R , 2013, Bioinform..

[16]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[17]  F. Koehn,et al.  The evolving role of natural products in drug discovery , 2005, Nature Reviews Drug Discovery.

[18]  Young Cheol Kim,et al.  Rhizobium etli USDA9032 Engineered To Produce a Phenazine Antibiotic Inhibits the Growth of Fungal Pathogens but Is Impaired in Symbiotic Performance , 2006, Applied and Environmental Microbiology.

[19]  A. Kinghorn,et al.  Special Problems with the Extraction of Plants , 1998 .

[20]  James H Naismith,et al.  Structural aspects of non-ribosomal peptide biosynthesis. , 2004, Current opinion in structural biology.

[21]  L. Thomashow,et al.  Role of a phenazine antibiotic from Pseudomonas fluorescens in biological control of Gaeumannomyces graminis var. tritici , 1988, Journal of bacteriology.

[22]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[23]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[24]  David J Newman,et al.  Natural products as sources of new drugs over the 30 years from 1981 to 2010. , 2012, Journal of natural products.

[25]  S. Caboche Biosynthesis: bioinformatics bolster a renaissance. , 2014, Nature chemical biology.

[26]  Jay D Keasling,et al.  Metabolic engineering of microbial pathways for advanced biofuels production. , 2011, Current opinion in biotechnology.

[27]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[28]  Kyle R. Conway,et al.  ClusterMine360: a database of microbial PKS/NRPS biosynthesis , 2012, Nucleic Acids Res..

[29]  I-Min A. Chen,et al.  IMG/M 4 version of the integrated metagenome comparative analysis system , 2013, Nucleic Acids Res..

[30]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[31]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[32]  J. Zucko,et al.  Databases of the thiotemplate modular systems (CSDB) and their in silico recombinants (r-CSDB) , 2013, Journal of Industrial Microbiology & Biotechnology.

[33]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[34]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[35]  J. Vederas,et al.  [Drug discovery and natural products: end of era or an endless frontier?]. , 2011, Biomeditsinskaia khimiia.

[36]  Nikos Kyrpides,et al.  The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification , 2014, Nucleic Acids Res..

[37]  A. Boronin,et al.  A Seven-Gene Locus for Synthesis of Phenazine-1-Carboxylic Acid by Pseudomonas fluorescens2-79 , 1998, Journal of bacteriology.

[38]  J. Nielsen,et al.  Genome-scale analysis of Streptomyces coelicolor A3(2) metabolism. , 2005, Genome research.

[39]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[40]  Kai Blin,et al.  antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers , 2013, Nucleic Acids Res..