PlantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered in specific genomic loci: biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results and source code are freely available from http://plantismash.secondarymetabolites.org.

[1]  Xiaoquan Qi,et al.  Biosynthesis, regulation, and domestication of bitterness in cucumber , 2014, Science.

[2]  D. Sandhu,et al.  Gene-Containing Regions of Wheat and the Other Grass Genomes1 , 2002, Plant Physiology.

[3]  Nicola J Patron,et al.  DNA assembly for plant biology: techniques and tools. , 2014, Current opinion in plant biology.

[4]  Tilmann Weber,et al.  The evolution of genome mining in microbes - a review. , 2016, Natural product reports.

[5]  Wusheng Liu,et al.  Advanced genetic tools for plant biotechnology , 2013, Nature Reviews Genetics.

[6]  Eun-Jeong Lee,et al.  Norcoclaurine Synthase Is a Member of the Pathogenesis-Related 10/Bet v1 Protein Family[W] , 2010, Plant Cell.

[7]  Haibao Tang,et al.  Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum , 2015, Nature.

[8]  T. Ellis,et al.  Bricks and blueprints: methods and standards for DNA assembly , 2015, Nature Reviews Molecular Cell Biology.

[9]  Nicole K Clay,et al.  A new cyanogenic metabolite in Arabidopsis required for inducible pathogen defense , 2015, Nature.

[10]  Anne Osbourn,et al.  Plant metabolic clusters - from genetics to genomics. , 2016, The New phytologist.

[11]  A. Osbourn,et al.  Gene clustering in plant specialized metabolism. , 2014, Current opinion in biotechnology.

[12]  Kirsten Jørgensen,et al.  Genomic clustering of cyanogenic glucoside biosynthetic genes aids their identification in Lotus japonicus and suggests the repeated evolution of this chemical defence pathway. , 2011, The Plant journal : for cell and molecular biology.

[13]  B. Keller,et al.  Colinearity and gene density in grass genomes. , 2000, Trends in plant science.

[14]  A. Osbourn,et al.  Triterpene biosynthesis in plants. , 2014, Annual review of plant biology.

[15]  H. Vogel,et al.  The Gene Controlling the Indole Glucosinolate Modifier1 Quantitative Trait Locus Alters Indole Glucosinolate Structures and Aphid Resistance in Arabidopsis[W] , 2009, The Plant Cell Online.

[16]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[17]  A. Osbourn,et al.  Metabolic Diversification—Independent Assembly of Operon-Like Gene Clusters in Different Plants , 2008, Science.

[18]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[19]  Michael A Fischbach,et al.  Computational approaches to natural product discovery. , 2015, Nature chemical biology.

[20]  A. Osbourn,et al.  A gene cluster for secondary metabolism in oat: implications for the evolution of metabolic diversity in plants. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Elise A. R. Serin,et al.  Learning from co-expression networks , 2016 .

[22]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[23]  T. Winzer,et al.  A Papaver somniferum 10-Gene Cluster for Synthesis of the Anticancer Alkaloid Noscapine , 2012, Science.

[24]  D. Xie,et al.  Biosynthesis and Metabolic Engineering of Anthocyanins in Arabidopsis thaliana , 2014, Recent patents on biotechnology.

[25]  M Frey,et al.  Analysis of a chemical plant defense mechanism in grasses. , 1997, Science.

[26]  Andrew G. Sharpe,et al.  The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure , 2014, Nature Communications.

[27]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[28]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[29]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[30]  Sergio Alan Cervantes-Pérez,et al.  Architecture and evolution of a minute plant genome , 2013, Nature.

[31]  G. Challis,et al.  Discovery of microbial natural products by activation of silent biosynthetic gene clusters , 2015, Nature Reviews Microbiology.

[32]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[33]  H. Mori,et al.  Genome Structure of the Legume, Lotus japonicus , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[34]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[35]  T. Molinski,et al.  Arabidopsis glucosyltransferase UGT74B1 functions in glucosinolate biosynthesis and auxin homeostasis. , 2004, The Plant journal : for cell and molecular biology.

[36]  Anne Osbourn,et al.  Computational genomic identification and functional reconstitution of plant natural product biosynthetic pathways , 2016, Natural product reports.

[37]  Paul R Jensen,et al.  Natural Products and the Gene Cluster Revolution. , 2016, Trends in microbiology.

[38]  Christian Rogers,et al.  Standards for plant synthetic biology: a common syntax for exchange of DNA parts. , 2015, The New phytologist.

[39]  Elizabeth A Kellogg,et al.  The evolution of nuclear genome structure in seed plants. , 2004, American journal of botany.

[40]  Kai Blin,et al.  antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers , 2013, Nucleic Acids Res..

[41]  Corinne Da Silva,et al.  Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome , 2014, Science.

[42]  Artem Cherkasov,et al.  Scaffold tailoring by a newly detected Pictet-Spenglerase activity of strictosidine synthase: from the common tryptoline skeleton to the rare piperazino-indole framework. , 2012, Journal of the American Chemical Society.

[43]  Paul S. Freemont,et al.  Delineation of metabolic gene clusters in plant genomes by chromatin signatures , 2016, Nucleic acids research.

[44]  Elise A. R. Serin,et al.  Learning from Co-expression Networks: Possibilities and Challenges , 2016, Front. Plant Sci..

[45]  A. Aharoni,et al.  Biosynthesis of Antinutritional Alkaloids in Solanaceous Crops Is Mediated by Clustered Genes , 2013, Science.

[46]  Hadi Quesneville,et al.  Formation of plant metabolic gene clusters within dynamic chromosomal regions , 2011, Proceedings of the National Academy of Sciences.

[47]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[48]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[49]  Anne Osbourn,et al.  Investigation of terpene diversification across multiple sequenced plant genomes , 2014, Proceedings of the National Academy of Sciences.

[50]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.