Analysis of Arabidopsis non-reference accessions reveals high diversity of metabolic gene clusters and discovers new candidate cluster members

Metabolic gene clusters (MGCs) are groups of genes involved in a common biosynthetic pathway. They are frequently formed in dynamic chromosomal regions, which may lead to intraspecies variation and cause phenotypic diversity. We examined copy number variations (CNVs) in four Arabidopsis thaliana MGCs in over one thousand accessions with experimental and bioinformatic approaches. Tirucalladienol and marneral gene clusters showed little variation, and the latter was fixed in the population. Thalianol and especially arabidiol/baruol gene clusters displayed substantial diversity. The compact version of the thalianol gene cluster was predominant and more conserved than the noncontiguous version. In the arabidiol/baruol cluster, we found a large genomic insertion containing divergent duplicates of the CYP705A2 and BARS1 genes. The BARS1 paralog, which we named BARS2, encoded a novel oxidosqualene synthase. The expression of the entire arabidiol/baruol gene cluster was altered in the accessions with the duplication. Moreover, they presented different root growth dynamics and were associated with warmer climates compared to the reference-like accessions. In the entire genome, paired genes encoding terpene synthases and cytochrome P450 oxidases were more variable than their nonpaired counterparts. Our study highlights the role of dynamically evolving MGCs in plant adaptation and phenotypic diversity.

[1]  F. Ariel,et al.  The lncRNA MARS modulates the epigenetic reprogramming of the marneral cluster in response to ABA. , 2022, Molecular plant.

[2]  S. Ovchinnikov,et al.  ColabFold: making protein folding accessible to all , 2022, Nature Methods.

[3]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[4]  Ancheng C. Huang,et al.  Modulation of Arabidopsis root growth by specialized triterpenes. , 2020, The New phytologist.

[5]  A. Fernie,et al.  Selection of a subspecies-specific diterpene gene cluster implicated in rice disease resistance , 2020, Nature Plants.

[6]  A. Osbourn,et al.  Formation and diversification of a paradigm biosynthetic gene cluster in plants , 2020, Nature Communications.

[7]  J. Keurentjes,et al.  The genetic framework of shoot regeneration in Arabidopsis comprises master regulators and conditional fine-tuning factors , 2020, Communications Biology.

[8]  D. Kliebenstein,et al.  Genetic variation, environment and demography intersect to shape Arabidopsis defense metabolite variation across Europe , 2020, bioRxiv.

[9]  D. Kliebenstein,et al.  Plant Secondary Metabolites as Defenses, Regulators, and Primary Metabolites: The Blurred Functional Trichotomy1[OPEN] , 2020, Plant Physiology.

[10]  Selene L. Fernandez-Valverde,et al.  Active and repressed biosynthetic gene clusters have spatially distinct chromosome states , 2020, Proceedings of the National Academy of Sciences.

[11]  W. Karłowski,et al.  AthCNV: A Map of DNA Copy Number Variations in the Arabidopsis Genome[OPEN] , 2020, Plant Cell.

[12]  S. Shiu,et al.  Evolution of a plant gene cluster in Solanaceae and emergence of metabolic diversity , 2020, bioRxiv.

[13]  S. Yeaman,et al.  Gene clustering and copy number variation in alkaloid metabolic pathways of opium poppy , 2020, Nature Communications.

[14]  Taedong Yun,et al.  Accurate, scalable cohort variant calls using DeepVariant and GLnexus , 2020, bioRxiv.

[15]  A. Al-Harrasi,et al.  Sphingomonas: from diversity and genomics to functional role in environmental remediation and plant growth , 2020, Critical reviews in biotechnology.

[16]  K. Borgwardt,et al.  AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana , 2019, Nucleic Acids Res..

[17]  A. Osbourn,et al.  Drivers of metabolic diversification: how dynamic genomic neighbourhoods generate new biosynthetic pathways in the Brassicaceae , 2019, The New phytologist.

[18]  R. Irizarry ggplot2 , 2019, Introduction to Data Science.

[19]  Derek S. Lundberg,et al.  Natural selection on the Arabidopsis thaliana genome in present and future climates , 2019, Nature.

[20]  K. Schneeberger,et al.  Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics , 2019, bioRxiv.

[21]  T. Isah Stress and defense responses in plant secondary metabolites production , 2019, Biological Research.

[22]  L. An,et al.  Sphingomonas sp. Cra20 Increases Plant Growth Rate and Alters Rhizosphere Microbial Community Structure of Arabidopsis thaliana Under Drought Stress , 2019, Front. Microbiol..

[23]  Ancheng C. Huang,et al.  A specialized metabolic network selectively modulates Arabidopsis root microbiota , 2019, Science.

[24]  Simon C. Potter,et al.  The EMBL-EBI search and sequence analysis tools APIs in 2019 , 2019, Nucleic Acids Res..

[25]  A. Osbourn,et al.  Metabolic Gene Clusters in Eukaryotes. , 2018, Annual review of genetics.

[26]  Santosh B. Satbhai,et al.  Natural allelic variation of the AZI1 gene controls root growth under zinc-limiting condition , 2018, PLoS genetics.

[27]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[28]  K. Shinozaki,et al.  A Highly Specific Genome-Wide Association Study Integrated with Transcriptome Data Reveals the Contribution of Copy Number Variations to Specialized Metabolites in Arabidopsis thaliana Accessions , 2017, Molecular biology and evolution.

[29]  Jennifer H. Wisecaver,et al.  Drivers of genetic diversity in secondary metabolic gene clusters within a fungal species , 2017, bioRxiv.

[30]  Jennifer H. Wisecaver,et al.  A Global Coexpression Network Approach for Connecting Genes to Specialized Metabolic Pathways in Plants , 2017, Plant Cell.

[31]  M. Figlerowicz,et al.  MLPA-Based Analysis of Copy Number Variation in Plant Populations , 2017, Front. Plant Sci..

[32]  M. Nordborg,et al.  On the post-glacial spread of human commensal Arabidopsis thaliana , 2017, Nature Communications.

[33]  D. Tholl,et al.  Formation and exudation of non-volatile products of the arabidiol triterpenoid degradation pathway in Arabidopsis roots , 2017, Plant signaling & behavior.

[34]  Kai Blin,et al.  plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters , 2016, bioRxiv.

[35]  Karsten M. Borgwardt,et al.  AraPheno: a public database for Arabidopsis thaliana phenotypes , 2016, Nucleic Acids Res..

[36]  Sumit Ghosh Biosynthesis of Structurally Diverse Triterpenes in Plants: the Role of Oxidosqualene Cyclases , 2016 .

[37]  L. Voesenek,et al.  Transcriptomes of Eight Arabidopsis thaliana Accessions Reveal Core Conserved, Genotype- and Organ-Specific Responses to Flooding Stress1[OPEN] , 2016, Plant Physiology.

[38]  Stefan R. Henz,et al.  Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions , 2016, Cell.

[39]  Karsten M. Borgwardt,et al.  1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana , 2016, Cell.

[40]  F. Thibaud-Nissen,et al.  Araport11: a complete reannotation of the Arabidopsis thaliana reference genome , 2016, bioRxiv.

[41]  Paul S. Freemont,et al.  Delineation of metabolic gene clusters in plant genomes by chromatin signatures , 2016, Nucleic acids research.

[42]  T. Muranaka,et al.  Novel triterpene oxidizing activity of Arabidopsis thaliana CYP716A subfamily enzymes , 2016, FEBS letters.

[43]  D. Kliebenstein,et al.  In Planta Variation of Volatile Biosynthesis: An Alternative Biosynthetic Route to the Formation of the Pathogen-Induced Volatile Homoterpene DMNT via Triterpene Degradation in Arabidopsis Roots , 2015, Plant Cell.

[44]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[45]  Anne Osbourn,et al.  Investigation of terpene diversification across multiple sequenced plant genomes , 2014, Proceedings of the National Academy of Sciences.

[46]  L. Daviet,et al.  The rise of operon-like gene clusters in plants. , 2014, Trends in plant science.

[47]  A. Osbourn,et al.  Triterpene biosynthesis in plants. , 2014, Annual review of plant biology.

[48]  A. Osbourn,et al.  Gene clustering in plant specialized metabolism. , 2014, Current opinion in biotechnology.

[49]  M. Kolesnikova,et al.  An effective strategy for exploring unknown metabolic pathways by genome mining. , 2013, Journal of the American Chemical Society.

[50]  J. Bergelson,et al.  Bacterial Communities Associated with the Leaves and the Roots of Arabidopsis thaliana , 2013, PloS one.

[51]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[52]  M. Suh,et al.  Identification of marneral synthase, which is critical for growth and development in Arabidopsis. , 2012, The Plant journal : for cell and molecular biology.

[53]  Robert C. Edgar,et al.  Defining the core Arabidopsis thaliana root microbiome , 2012, Nature.

[54]  R. Amann,et al.  Revealing structure and assembly cues for Arabidopsis root-inhabiting bacterial microbiota , 2012, Nature.

[55]  M. Hirai,et al.  Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes. , 2012, Gene.

[56]  Hadi Quesneville,et al.  Formation of plant metabolic gene clusters within dynamic chromosomal regions , 2011, Proceedings of the National Academy of Sciences.

[57]  D. Nelson,et al.  A P450-centric view of plant evolution. , 2011, The Plant journal : for cell and molecular biology.

[58]  J. Vorholt,et al.  Protection of Arabidopsis thaliana against Leaf-Pathogenic Pseudomonas syringae by Sphingomonas Strains in a Controlled Model System , 2011, Applied and Environmental Microbiology.

[59]  B. Hamberger,et al.  Cytochromes P450 , 2011, The arabidopsis book.

[60]  Joy Bergelson,et al.  Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana , 2010, Proceedings of the National Academy of Sciences.

[61]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[62]  A. Osbourn,et al.  Cell Type–Specific Chromatin Decondensation of a Metabolic Gene Cluster in Oats[C][W][OA] , 2009, The Plant Cell Online.

[63]  D. Nelson The Cytochrome P450 Homepage , 2009, Human Genomics.

[64]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[65]  W. K. Wilson,et al.  Product profile of PEN3: the last unexamined oxidosqualene cyclase in Arabidopsis thaliana. , 2009, Organic letters.

[66]  A. Osbourn,et al.  Metabolic Diversification—Independent Assembly of Operon-Like Gene Clusters in Different Plants , 2008, Science.

[67]  N. Provart,et al.  An extensive (co-)expression analysis tool for the cytochrome P450 superfamily in Arabidopsis thaliana , 2008, BMC Plant Biology.

[68]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[69]  W. K. Wilson,et al.  An oxidosqualene cyclase makes numerous products by diverse mechanisms: a challenge to prevailing concepts of triterpene biosynthesis. , 2007, Journal of the American Chemical Society.

[70]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[71]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[72]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[73]  Y. Ebizuka,et al.  A new triterpene synthase from Arabidopsis thaliana produces a tricyclic triterpene with two hydroxyl groups. , 2006, Organic letters.

[74]  W. K. Wilson,et al.  An Arabidopsis oxidosqualene cyclase catalyzes iridal skeleton formation by Grob fragmentation. , 2006, Angewandte Chemie.

[75]  T. Schulz-Gasch,et al.  Enzyme redesign: two mutations cooperate to convert cycloartenol synthase into an accurate lanosterol synthase. , 2005, Journal of the American Chemical Society.

[76]  Burkhard Morgenstern,et al.  AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints , 2005, Nucleic Acids Res..

[77]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[78]  T. Schulz-Gasch,et al.  Insight into steroid scaffold formation from the structure of human oxidosqualene cyclase , 2004, Nature.

[79]  Ran Xu,et al.  Genome mining to identify new plant triterpenoids. , 2004, Journal of the American Chemical Society.

[80]  S. Bak,et al.  Intron-exon organization and phylogeny in a large superfamily, the paralogous cytochrome P450 genes of Arabidopsis thaliana. , 2000, DNA and cell biology.

[81]  A. Muttray,et al.  Lessons learned from Sphingomonas species that degrade abietane triterpenoids , 1999, Journal of Industrial Microbiology and Biotechnology.

[82]  Benveniste,et al.  Cytochrome P450 , 1993, Handbook of Experimental Pharmacology.

[83]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.