Building an octaploid genome and transcriptome of the medicinal plant Pogostemon cablin from Lamiales

The Lamiales order presents highly varied genome sizes and highly specialized life strategies. Patchouli, Pogostemon cablin (Blanco) Benth. from the Lamiales, has been widely cultivated in tropical and subtropical areas of Asia owing to high demand for its essential oil. Here, we generated ~681 Gb genomic sequences (~355X coverage) for the patchouli, and the assembled genome is ~1.91 Gb and with 110,850 predicted protein-coding genes. Analyses showed clear evidence of whole-genome octuplication (WGO) since the pan-eudicots γ triplication, which is a recent and exclusive polyploidization event and occurred at ~6.31 million years ago. Analyses of TPS gene family showed the expansion of type-a, which is responsible for the synthesis of sesquiterpenes and maybe highly specialization in patchouli. Our datasets provide valuable resources for plant genome evolution, and for identifying of genes related to secondary metabolites and their gene expression regulation. Design Type(s) phylogenetic analysis objective • replicate design • sequence assembly objective Measurement Type(s) whole genome sequencing • transcriptional profiling assay Technology Type(s) DNA sequencing • RNA sequencing Factor Type(s) Read Length • biological replicate Sample Characteristic(s) Pogostemon cablin • root • stem • leaf Design Type(s) phylogenetic analysis objective • replicate design • sequence assembly objective Measurement Type(s) whole genome sequencing • transcriptional profiling assay Technology Type(s) DNA sequencing • RNA sequencing Factor Type(s) Read Length • biological replicate Sample Characteristic(s) Pogostemon cablin • root • stem • leaf Machine-accessible metadata file describing the reported data (ISA-Tab format)

[1]  A. Y. Leung,et al.  Encyclopedia of Common Natural Ingredients: Used in Food, Drugs, and Cosmetics , 1980 .

[2]  T. Attwood,et al.  PRINTS--a database of protein motif fingerprints. , 1994, Nucleic acids research.

[3]  H. Surburg,et al.  Common Fragrance and Flavor Materials: Preparation, Properties and Uses , 1997 .

[4]  Peer Bork,et al.  SMART: identification and annotation of domains from signalling and extracellular protein sequences , 1999, Nucleic Acids Res..

[5]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[8]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[9]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[10]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[11]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[12]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[13]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[14]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[15]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[16]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[17]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[18]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[19]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[20]  I. Leitch,et al.  First nuclear DNA amounts in more than 300 angiosperms. , 2005, Annals of botany.

[21]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[22]  M. Schalk,et al.  The diverse sesquiterpene profile of patchouli, Pogostemon cablin, is correlated with a limited number of sesquiterpene synthases. , 2006, Archives of biochemistry and biophysics.

[23]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[24]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[25]  Burkhard Morgenstern,et al.  AUGUSTUS: ab initio prediction of alternative transcripts , 2006, Nucleic Acids Res..

[26]  Gerard Talavera,et al.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. , 2007, Systematic biology.

[27]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[28]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[29]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[30]  R. Schultes Common Fragrance and Flavor Materials: Preparation, Properties and Uses , 1987, Economic Botany.

[31]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[32]  Stefan Götz,et al.  Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics , 2007, International journal of plant genomics.

[33]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[34]  D. Albach,et al.  Towards resolving Lamiales relationships: insights from rapidly evolving chloroplast sequences , 2010, BMC Evolutionary Biology.

[35]  Q. Guo,et al.  Genetic diversity analysis among and within populations of Pogostemon cablin from China with ISSR and SRAP markers , 2010 .

[36]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[37]  Xin Gao,et al.  Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. , 2011, Current protocols in bioinformatics.

[38]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[39]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[40]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[41]  J. Bohlmann,et al.  The family of terpene synthases in plants: a mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. , 2011, The Plant journal : for cell and molecular biology.

[42]  David M. A. Martin,et al.  Genome sequence and analysis of the tuber crop potato , 2011, Nature.

[43]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[44]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[45]  Jeremy D. DeBarry,et al.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity , 2012, Nucleic acids research.

[46]  Daniel W. A. Buchan,et al.  The tomato genome sequence provides insights into fleshy fruit evolution , 2012, Nature.

[47]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[48]  S. Wessler,et al.  Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing , 2013, Proceedings of the National Academy of Sciences.

[49]  Miranda J. Haus,et al.  Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.) , 2013, Genome Biology.

[50]  Jun Wang,et al.  Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis , 2014, Genome Biology.

[51]  Sergio Alan Cervantes-Pérez,et al.  Architecture and evolution of a minute plant genome , 2013, Nature.

[52]  Miranda J. Haus,et al.  Nelumbo nucifera [data set] , 2013 .

[53]  Daniel Nilsson,et al.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge , 2014, Genome Biology.

[54]  M. Borodovsky,et al.  Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm , 2014, Nucleic acids research.

[55]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[56]  Richard M. Leggett,et al.  NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries , 2013, Bioinform..

[57]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[58]  Paul Medvedev,et al.  Informed and automated k-mer size selection for genome assembly , 2013, Bioinform..

[59]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[60]  Wen Wang,et al.  Hybrid de novo genome assembly of the Chinese herbal plant danshen (Salvia miltiorrhiza Bunge) , 2015, GigaScience.

[61]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[62]  Bernardo J. Clavijo,et al.  Genome-guided investigation of plant natural product biosynthesis. , 2015, The Plant journal : for cell and molecular biology.

[63]  Qiye Li,et al.  The Genome of Dendrobium officinale Illuminates the Biology of the Important Traditional Chinese Orchid Herb. , 2015, Molecular plant.

[64]  Paolo Ribeca,et al.  Genome sequence of the olive tree, Olea europaea , 2016, GigaScience.

[65]  Cheng Peng,et al.  Survey of the genome of Pogostemon cablin provides insights into its evolutionary history and sesquiterpenoid biosynthesis , 2016, Scientific Reports.

[66]  Cheng Peng,et al.  Transcriptome sequencing provides insights into the metabolic pathways of patchouli alcohol and pogostone in Pogostemon cablin (Blanco) Benth. , 2016, Genes & Genomics.

[67]  N. Nagarajan,et al.  The draft genome of tropical fruit durian (Durio zibethinus) , 2017, Nature Genetics.

[68]  Rolf Lohaus,et al.  The Apostasia genome and the evolution of orchids , 2017, Nature.

[69]  Bernardo J. Clavijo,et al.  Genome sequence and genetic diversity of European ash trees , 2016, Nature.

[70]  L. Rieseberg,et al.  The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution , 2017, Nature.

[71]  Kazuki Saito,et al.  Draft genome assembly and annotation of Glycyrrhiza uralensis, a medicinal legume , 2017, The Plant journal : for cell and molecular biology.

[72]  Sean R. Johnson,et al.  Draft Genome Sequence of Mentha longifolia and Development of Resources for Mint Cultivar Improvement. , 2017, Molecular plant.