Bioinformatics approaches and software for detection of secondary metabolic gene clusters.

The accelerating pace of microbial genomics is sparking a renaissance in the field of natural products research. Researchers can now get a preview of the organism's secondary metabolome by analyzing its genomic sequence. Combined with other -omics data, this approach may provide a cost-effective alternative to industrial high-throughput screening in drug discovery. In the last few years, several computational tools have been developed to facilitate this process by identifying genes involved in secondary metabolite biosynthesis in bacterial and fungal genomes. Here, we review seven software programs that are available for this purpose, with an emphasis on antibiotics & Secondary Metabolite Analysis SHell (antiSMASH) and Secondary Metabolite Unknown Regions Finder (SMURF), the only tools that can comprehensively detect complete secondary metabolite biosynthesis gene clusters. We also discuss five related software packages-CLUster SEquence ANalyzer (CLUSEAN), ClustScan, Structure Based Sequence Analysis of Polyketide Synthases (SBSPKS), NRPSPredictor, and Natural Product searcher (NP.searcher)-that identify secondary metabolite backbone biosynthesis genes. This chapter offers detailed protocols, suggestions, and caveats to assist researchers in using these tools most effectively.

[1]  C. Hertweck,et al.  Genomics-inspired discovery of natural products. , 2011, Current opinion in chemical biology.

[2]  Tilmann Weber,et al.  Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs) , 2005, Nucleic acids research.

[3]  I. Hoof,et al.  CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. , 2009, Journal of biotechnology.

[4]  Li‐Jun Ma,et al.  A practical guide to fungal genome projects: strategy, technology, cost and completion , 2010 .

[5]  Peter Man-Un Ung,et al.  Automated genome mining for natural products , 2009, BMC Bioinformatics.

[6]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[7]  Sofia M. C. Robb,et al.  MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. , 2007, Genome research.

[8]  Gitanjali Yadav,et al.  SBSPKS: structure based sequence analysis of polyketide synthases , 2010, Nucleic Acids Res..

[9]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[10]  Bernard Henrissat,et al.  The 2008 update of the Aspergillus nidulans genome annotation: a community effort. , 2009, Fungal genetics and biology : FG & B.

[11]  Natalia N. Ivanova,et al.  GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes , 2010, Nature Methods.

[12]  J. Zucko,et al.  ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures , 2008, Nucleic acids research.

[13]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[14]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[15]  D. Haft,et al.  SMURF: Genomic mapping of fungal secondary metabolite clusters. , 2010, Fungal genetics and biology : FG & B.

[16]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[17]  Hans G. Schlegel,et al.  Biology of the prokaryotes , 1999 .

[18]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[19]  N. Keller,et al.  Metabolic pathway gene clusters in filamentous fungi. , 1997, Fungal genetics and biology : FG & B.

[20]  G. Payne,et al.  Identification of Two Aflatrem Biosynthesis Gene Loci in Aspergillus flavus and Metabolic Engineering of Penicillium paxilli To Elucidate Their Function , 2009, Applied and Environmental Microbiology.

[21]  P. Long,et al.  A novel docking domain interface model predicting recombination between homoeologous modular biosynthetic gene clusters , 2010, Journal of Industrial Microbiology & Biotechnology.

[22]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.