Plastid Genome Assembly Using Long-read Data (ptGAUL)

Although plastid genome (plastome) structure is highly conserved across most seed plants, investigations during the past two decades revealed several disparately related lineages that experienced substantial rearrangements. Most plastomes contain a large, inverted repeat and two single-copy regions and few dispersed repeats, however the plastomes of some taxa harbor long repeat sequences (>300 bp). These long repeats make it difficult to assemble complete plastomes using short-read data leading to misassemblies and consensus sequences that have spurious rearrangements. Single-molecule, long-read sequencing has the potential to overcome these challenges, yet there is no consensus on the most effective method for accurately assembling plastomes using long-read data. We generated a pipeline, plastid Genome Assembly Using Long-read data (ptGAUL), to address the problem of plastome assembly using long-read data from Oxford Nanopore Technologies (ONT) or Pacific Biosciences platforms. We demonstrated the efficacy of the ptGAUL pipeline using 16 published long-read datasets. We showed that ptGAUL produces accurate and unbiased assemblies. Additionally, we employed ptGAUL to assemble four new Juncus (Juncaceae) plastomes using ONT long reads. Our results revealed many long repeats and rearrangements in Juncus plastomes compared with basal lineages of Poales.

[1]  Jiming Jiang,et al.  Chromosome-scale genome assemblies and annotations for Poales species Carex cristatella, Carex scoparia, Juncus effusus, and Juncus inflexus , 2022, G3.

[2]  Jeremy R. Wang,et al.  Polishing De Novo Nanopore Assemblies of Bacteria and Eukaryotes With FMLRC2 , 2022, bioRxiv.

[3]  R. Terauchi,et al.  The complete chloroplast genome of Onobrychis gaubae (Fabaceae-Papilionoideae): comparative analysis with related IR-lacking clade species , 2022, BMC Plant Biology.

[4]  Jiaojun Yu,et al.  Complete chloroplast genomes of Rubus species (Rosaceae) and comparative analysis within the genus , 2022, BMC Genomics.

[5]  H. B. Shaffer,et al.  Reference Genome Assembly of the Big Berry Manzanita (Arctostaphylos glauca) , 2021, The Journal of heredity.

[6]  G. Mwachala,et al.  Comparative and phylogenetic analyses of six Kenya Polystachya (Orchidaceae) species based on the complete chloroplast genome sequences , 2021, BMC Plant Biology.

[7]  Jun-bo Yang,et al.  Organelle Phylogenomics and Extensive Conflicting Phylogenetic Signals in the Monocot Order Poales , 2022, Frontiers in Plant Science.

[8]  Xiangdong Yang,et al.  Complete Chloroplast Genome Sequence and Comparative and Phylogenetic Analyses of the Cultivated Cyperus esculentus , 2021, Diversity.

[9]  Xin Gao,et al.  A sensitive repeat identification framework based on short and long reads , 2021, Nucleic acids research.

[10]  Mengya Lu,et al.  Characterization and phylogenetic analysis of the complete chloroplast genome of Juncus effusus L , 2021, Mitochondrial DNA. Part B, Resources.

[11]  Tracey A Ruhlman,et al.  The chicken or the egg? Plastome evolution and a novel loss of the inverted repeat in papilionoid legumes , 2021, bioRxiv.

[12]  Yajun Li,et al.  The complete chloroplast genome of the marine microalgae Chaetoceros muellerii (Chaetoceroceae) , 2021, Mitochondrial DNA. Part B, Resources.

[13]  D. Cantrill,et al.  Long-read assemblies reveal structural diversity in genomes of organelles – an example with Acacia pycnantha , 2020, bioRxiv.

[14]  S. Tangphatsornruang,et al.  Assembly of the durian chloroplast genome using long PacBio reads , 2020, Scientific Reports.

[15]  Tracey A Ruhlman,et al.  Caught in the Act: Variation in plastid genome inverted repeat expansion within and between populations of Medicago minima , 2020, Ecology and evolution.

[16]  Xiao-Li Zhong Assembly, annotation and analysis of chloroplast genomes , 2020 .

[17]  C. dePamphilis,et al.  GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes , 2019, bioRxiv.

[18]  P. Waterhouse,et al.  Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes. , 2019, Trends in plant science.

[19]  B. Zhu,et al.  The complete chloroplast genome sequence of garden cress (Lepidium sativum L.) and its phylogenetic analysis in Brassicaceae family , 2019, Mitochondrial DNA. Part B, Resources.

[20]  A. Korte,et al.  A systematic comparison of chloroplast genome assembly tools , 2019, bioRxiv.

[21]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[22]  H. Balslev Two new species of Juncus (Juncaceae) from South America , 2018, Phytotaxa.

[23]  H. Linder,et al.  Nuclear genes, matK and the phylogeny of the Poales , 2018, Taxon.

[24]  R. Lanfear,et al.  Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case , 2018, bioRxiv.

[25]  Yingjuan Su,et al.  The Repeat Sequences and Elevated Substitution Rates of the Chloroplast accD Gene in Cupressophytes , 2018, Front. Plant Sci..

[26]  L. McMillan,et al.  FMLRC: Hybrid long read error correction using an FM-index , 2018, BMC Bioinformatics.

[27]  Robert K. Jansen,et al.  Aberration or Analogy? The Atypical Plastomes of Geraniaceae , 2018 .

[28]  Tracey A Ruhlman,et al.  Contrasting Patterns of Nucleotide Substitution Rates Provide Insight into Dynamic Evolution of Plastid and Mitochondrial Genomes of Geranium , 2017, Genome biology and evolution.

[29]  Axel Fischer,et al.  GeSeq – versatile and accurate annotation of organelle genomes , 2017, Nucleic Acids Res..

[30]  Tracey A Ruhlman,et al.  Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure. , 2017, American journal of botany.

[31]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[32]  R. W. Ness,et al.  Strategies for complete plastid genome sequencing , 2016, Molecular ecology resources.

[33]  D. Haak,et al.  Organelle_PBA, a pipeline for assembling chloroplast and mitochondrial genomes from PacBio DNA sequencing data , 2017, BMC Genomics.

[34]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[35]  Yan Li,et al.  SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation , 2016, PloS one.

[36]  Jérôme Gouzy,et al.  Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. , 2016, BioTechniques.

[37]  Nicolas Dierckxsens,et al.  NOVOPlasty: de novo assembly of organelle genomes from whole genome data. , 2016, Nucleic acids research.

[38]  Kristin Decker,et al.  Ebb And Flow , 2016 .

[39]  A. J. Bendich,et al.  DNA maintenance in plastids and mitochondria of plants , 2015, Front. Plant Sci..

[40]  B. Weisshaar,et al.  SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome , 2015, BMC Bioinformatics.

[41]  Tracey A Ruhlman,et al.  Plastid genome sequences of legumes reveal parallel inversions and multiple losses of rps16 in papilionoids , 2015 .

[42]  S. Terakami,et al.  Complete chloroplast genome sequence of pineapple (Ananas comosus) , 2015, Tree Genetics & Genomes.

[43]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[44]  Jeremy J. W. Chen,et al.  NDH expression marks major transitions in plant evolution and reveals coordinate intracellular gene loss , 2015, BMC Plant Biology.

[45]  Sylvain Santoni,et al.  Cost‐effective enrichment hybridization capture of chloroplast genomes at deep multiplexing levels for population genetics and phylogeography studies , 2014, Molecular ecology resources.

[46]  T. Horii,et al.  Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes , 2014, BMC Genomics.

[47]  Tracey A Ruhlman,et al.  Evolutionary and biotechnology implications of plastid genome variation in the inverted-repeat-lacking clade of legumes. , 2014, Plant biotechnology journal.

[48]  M. Ikeuchi,et al.  Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation , 2014, Nature Communications.

[49]  T. Shikanai,et al.  Central role of cyclic electron transport around photosystem I in the regulation of photosynthesis. , 2014, Current opinion in biotechnology.

[50]  R. Jansen,et al.  Reconstruction of the ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangements, repeats, and nucleotide substitution rates. , 2014, Molecular biology and evolution.

[51]  Tracey A Ruhlman,et al.  The plastid genomes of flowering plants. , 2014, Methods in molecular biology.

[52]  E. Roorda,et al.  The Twelve Years , 2014 .

[53]  Riccardo Velasco,et al.  An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome , 2013, BMC Genomics.

[54]  Marc Lohse,et al.  OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets , 2013, Nucleic Acids Res..

[55]  M. Rousseau-Gueutin,et al.  Potential Functional Replacement of the Plastidic Acetyl-CoA Carboxylase Subunit (accD) Gene by Recent Transfers to the Nucleus in Some Angiosperm Lineages1[W][OA] , 2013, Plant Physiology.

[56]  T. Vandergon,et al.  Loss of the Acetyl-CoA Carboxylase (accD) Gene in Poales , 2013, Plant Molecular Biology Reporter.

[57]  Alberto Policriti,et al.  GapFiller: a de novo assembly approach to fill the gap within paired reads , 2012, BMC Bioinformatics.

[58]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[59]  Tracey A Ruhlman,et al.  Plastid Genomes of Seed Plants , 2012 .

[60]  Michael A Quail,et al.  Optimal enzymes for amplifying sequencing libraries , 2011, Nature Methods.

[61]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[62]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[63]  J. Palmer,et al.  Localized hypermutation and associated gene losses in legume chloroplast genomes. , 2010, Genome research.

[64]  Daniel R Zerbino,et al.  Using the Velvet de novo Assembler for Short‐Read Sequencing Technologies , 2010, Current protocols in bioinformatics.

[65]  N. Brisson,et al.  Recombination and the maintenance of plant organelle genome stability. , 2010, The New phytologist.

[66]  R. Jansen,et al.  Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae , 2010, Journal of Molecular Evolution.

[67]  R. Jansen,et al.  Extensive Reorganization of the Plastid Genome of Trifolium subterraneum (Fabaceae) Is Associated with Numerous Repeated Sequences and Novel DNA Insertions , 2008, Journal of Molecular Evolution.

[68]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[69]  Masaru Fujimoto,et al.  Substitution of the gene for chloroplast RPS16 was assisted by generation of a dual targeting signal. , 2008, Molecular biology and evolution.

[70]  R. Jansen,et al.  Extensive Rearrangements in the Chloroplast Genome of Trachelium caeruleum Are Associated with Repeats and tRNA Genes , 2008, Journal of Molecular Evolution.

[71]  Masaru Fujimoto,et al.  Loss of the rpl32 gene from the chloroplast genome and subsequent acquisition of a preexisting transit peptide within the nuclear gene in Populus. , 2007, Gene.

[72]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[73]  R. Jansen,et al.  Gene relocations within chloroplast genomes of Jasminum and Menodora (Oleaceae) are due to multiple, overlapping inversions. , 2007, Molecular biology and evolution.

[74]  M. Chase,et al.  THE JUNCACEAE-CYPERACEAE INTERFACE : A COMBINED PLASTID SEQUENCE ANALYSIS , 2007 .

[75]  Č. Vlček,et al.  Phylogenetic relationships within Luzula DC. and Juncus L. (Juncaceae): A comparison of phylogenetic signals of trnL‐trnF intergenic spacer, trnL intron and rbcL plastome sequence data , 2006 .

[76]  A. Day,et al.  The tobacco plastid accD gene is essential and is required for leaf development. , 2005, The Plant journal : for cell and molecular biology.

[77]  Ki-Joong Kim,et al.  Widespread occurrence of small inversions in the chloroplast genomes of land plants. , 2005, Molecules and cells.

[78]  K. H. Wolfe,et al.  Ebb and flow of the chloroplast inverted repeat , 1996, Molecular and General Genetics MGG.

[79]  W. Jeong,et al.  Characterization of the plastid-encoded carboxyltransferase subunit (accD) gene of potato. , 2004, Molecules and cells.

[80]  M. Clegg,et al.  A chloroplast DNA mutational hotspot and gene conversion in a noncoding region near rbcL in the grass family (Poaceae) , 1993, Current Genetics.

[81]  E. Rocha DNA repeats lead to the accelerated loss of gene order in bacteria. , 2003, Trends in genetics : TIG.

[82]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[83]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[84]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[85]  T. Konishi,et al.  Acetyl-CoA carboxylase in higher plants: most plants other than gramineae have both the prokaryotic and the eukaryotic forms of this enzyme. , 1996, Plant & cell physiology.

[86]  J. Ohlrogge,et al.  Lipid biosynthesis. , 1995, The Plant cell.

[87]  배동훈 Acetyl-CoA Carboxylase의 일차구조 , 1989 .