Chromosome-level genome assembly of an endangered plant Prunus mongolica using PacBio and Hi-C technologies

Abstract Prunus mongolica is an ecologically and economically important xerophytic tree native to Northwest China. Here, we report a high-quality, chromosome-level P. mongolica genome assembly integrating PacBio high-fidelity sequencing and Hi-C technology. The assembled genome was 233.17 Mb in size, with 98.89% assigned to eight pseudochromosomes. The genome had contig and scaffold N50s of 24.33 Mb and 26.54 Mb, respectively, a BUSCO completeness score of 98.76%, and CEGMA indicated that 98.47% of the assembled genome was reliably annotated. The genome contained a total of 88.54 Mb (37.97%) of repetitive sequences and 23,798 protein-coding genes. We found that P. mongolica experienced two whole-genome duplications, with the most recent event occurring ~3.57 million years ago. Phylogenetic and chromosome syntenic analyses revealed that P. mongolica was closely related to P. persica and P. dulcis. Furthermore, we identified a number of candidate genes involved in drought tolerance and fatty acid biosynthesis. These candidate genes are likely to prove useful in studies of drought tolerance and fatty acid biosynthesis in P. mongolica, and will provide important genetic resources for molecular breeding and improvement experiments in Prunus species. This high-quality reference genome will also accelerate the study of the adaptation of xerophytic plants to drought.

[1]  Zhanjun Yang,et al.  Study on the mechanism of Amygdalus mongolica oil anti-renal fibrosis based on metabolomics and transcriptomics , 2022, Pharmacological Research - Modern Chinese Medicine.

[2]  Long-Xi Yu,et al.  Genome Assembly of Alfalfa Cultivar Zhongmu-4 and Identification of SNPs Associated with Agronomic Traits , 2022, Genom. Proteom. Bioinform..

[3]  Yongbo Liu,et al.  Genome Assembly and Population Resequencing Reveal the Geographical Divergence of Shanmei (Rubus corchorifolius) , 2021, bioRxiv.

[4]  Dechun Jiang,et al.  High-quality genome assembly of an important biodiesel plant, Euphorbia lathyris L , 2021, DNA research : an international journal for rapid publication of reports on genes and genomes.

[5]  Ling-Jian Wang,et al.  Sphingolipid metabolism, transport, and functions in plants: Recent progress and future perspectives , 2021, Plant communications.

[6]  Wenming Zhao,et al.  Genome Warehouse: A Public Repository Housing Genome-scale Data , 2021, bioRxiv.

[7]  Heng Li,et al.  Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm , 2021, Nature Methods.

[8]  Silvio C. E. Tosatto,et al.  Pfam: The protein families database in 2021 , 2020, Nucleic Acids Res..

[9]  Ana I. Caño-Delgado,et al.  The physiology of plant responses to drought , 2020, Science.

[10]  Cédric Feschotte,et al.  RepeatModeler2 for automated genomic discovery of transposable element families , 2020, Proceedings of the National Academy of Sciences.

[11]  M. Schatz,et al.  GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes , 2020, Nature Communications.

[12]  R. Solé,et al.  Global ecosystem thresholds driven by aridity , 2020, Science.

[13]  Yuannian Jiao,et al.  Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms. , 2020, Molecular plant.

[14]  Mark N. Puttick,et al.  MCMCtreeR: functions to prepare MCMCtree analyses and visualize posterior ages on trees , 2019, Bioinform..

[15]  Andrew G. Clark,et al.  RepeatModeler2: automated genomic discovery of transposable element families , 2019, bioRxiv.

[16]  S. Kelly,et al.  OrthoFinder: phylogenetic orthology inference for comparative genomics , 2019, Genome Biology.

[17]  Jonathan Wood,et al.  Identifying and removing haplotypic duplication in primary genome assemblies , 2019, bioRxiv.

[18]  C. Benning,et al.  Cellular Organization and Regulation of Plant Glycerolipid Metabolism. , 2019, Plant & cell physiology.

[19]  Davide Heller,et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..

[20]  Yves Van de Peer,et al.  wgd—simple command line tools for the analysis of ancient whole-genome duplications , 2018, Bioinform..

[21]  Shujun Ou,et al.  LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons1[OPEN] , 2017, Plant Physiology.

[22]  Haitao Shi,et al.  Alcohol dehydrogenase 1 (ADH1) confers both abiotic and biotic stress resistance in Arabidopsis. , 2017, Plant science : an international journal of experimental plant biology.

[23]  M. Deyholos,et al.  LTR-retrotransposons in plants: Engines of evolution. , 2017, Gene.

[24]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[25]  Uwe Scholz,et al.  MISA-web: a web server for microsatellite prediction , 2017, Bioinform..

[26]  M. Tester,et al.  AVP1: One Protein, Many Roles. , 2017, Trends in plant science.

[27]  Pamela S Soltis,et al.  Ancient WGD events as drivers of key innovations in angiosperms. , 2016, Current opinion in plant biology.

[28]  Jens Keilwagen,et al.  Using intron position conservation for homology-based gene prediction , 2016, Nucleic acids research.

[29]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[30]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[31]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[32]  He Zhang,et al.  Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution , 2015, Nature Biotechnology.

[33]  Min Liu,et al.  Mongolian Almond (Prunus mongolica Maxim): The Morpho-Physiological, Biochemical and Transcriptomic Response to Drought Stress , 2015, PloS one.

[34]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[35]  Qing-Yong Yang,et al.  De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. , 2015, Molecular plant.

[36]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[37]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[38]  D. Huson,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[39]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[40]  Alexandre Lomsadze,et al.  Identification of protein coding regions in RNA transcripts , 2014, BCB.

[41]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[42]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[43]  Mira V. Han,et al.  Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. , 2013, Molecular biology and evolution.

[44]  Joachim Kilian,et al.  Plant Core Environmental Stress Response Genes Are Systemically Coordinated during Abiotic Stresses , 2013, International journal of molecular sciences.

[45]  Robert D. Finn,et al.  Dfam: a database of repetitive DNA based on profile hidden Markov models , 2012, Nucleic Acids Res..

[46]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[47]  Jeremy D. DeBarry,et al.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity , 2012, Nucleic acids research.

[48]  M. Figlerowicz,et al.  RNA degradome—its biogenesis and functions , 2011, Nucleic acids research.

[49]  J. Doonan,et al.  Cyclin dependent protein kinases and stress responses in plants , 2011, Plant signaling & behavior.

[50]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[51]  Richard M. Clark,et al.  The Rate and Molecular Spectrum of Spontaneous Mutations in Arabidopsis thaliana , 2010, Science.

[52]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[53]  T. Mailund,et al.  SNPFile – A software library and file format for large scale association mapping and population genetics studies , 2008, BMC Bioinformatics.

[54]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[55]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[56]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[57]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[58]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[59]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[60]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[61]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[62]  K. Koch,et al.  Sucrose metabolism: regulatory mechanisms and pivotal roles in sugar sensing and plant development. , 2004, Current opinion in plant biology.

[63]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[64]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[65]  Jodie J. Yin,et al.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes , 2004, Genome Biology.

[66]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[67]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[68]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[69]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[70]  J. Cherry,et al.  Arabidopsis thaliana: a model plant for genome analysis. , 1998, Science.

[71]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[72]  Bernard R. Baum,et al.  Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components , 1997, Plant Molecular Biology Reporter.

[73]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[74]  Yulong Wang,et al.  Study on Botanical Characteristics of Single Plant of Prunus Mongolica , 2021 .

[75]  Torkel Loman,et al.  A Novel Method for Predicting Ribosomal RNA Genes in Prokaryotic Genomes , 2017 .

[76]  N. Friedman,et al.  Trinity : reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2016 .

[77]  P. Ahmad,et al.  Abiotic Stress Responses in Plants , 2012, Springer New York.

[78]  Huayu Lu,et al.  Aeolian sediment evidence that global cooling has driven late Cenozoic stepwise aridification in central Asia , 2010 .

[79]  Kazutaka Katoh,et al.  Multiple alignment of DNA sequences with MAFFT. , 2009, Methods in molecular biology.

[80]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[81]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[82]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[83]  Nansheng Chen,et al.  Genblasta: Enabling Blast to Identify Homologous Gene Sequences , 2022 .