De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping

BackgroundIt remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome.MethodsIn this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work.ResultsWe obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.

[1]  Yi Yang,et al.  Alignment of Optical Maps , 2005, RECOMB.

[2]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[3]  T. Anantharaman,et al.  Automated high resolution optical mapping using arrayed, fluid-fixed DNA molecules. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Deacon J. Sweeney,et al.  Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus) , 2012, Nature Biotechnology.

[5]  Leopold Parts,et al.  A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes , 2014, Molecular biology and evolution.

[6]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[7]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[8]  W. Goessens,et al.  Whole-genome mapping for high-resolution genotyping of Pseudomonas aeruginosa. , 2014, Journal of microbiological methods.

[9]  David C. Schwartz,et al.  Refinement of optical map assemblies , 2006, Bioinform..

[10]  P. T. Magee,et al.  Electrophoretic karyotypes and chromosome numbers in Candida species. , 1987, Journal of general microbiology.

[11]  Alberto Policriti,et al.  GAM-NGS: genomic assemblies merger for next generation sequencing , 2013, BMC Bioinformatics.

[12]  James H. Bullard,et al.  A hybrid approach for the automated finishing of bacterial genomes , 2012, Nature Biotechnology.

[13]  B. Dujon,et al.  Genome evolution in yeasts , 2004, Nature.

[14]  Antoine Margeot,et al.  High-quality genome (re)assembly using chromosomal contact data , 2014, Nature Communications.

[15]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[16]  Douglas G. Scofield,et al.  The Norway spruce genome sequence and conifer genome evolution , 2013, Nature.

[17]  Kara Dolinski,et al.  Genome-Wide Analysis of Nucleotide-Level Variation in Commonly Used Saccharomyces cerevisiae Strains , 2007, PloS one.

[18]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[19]  L. Pachter,et al.  CGAL: computing genome assembly likelihoods , 2013 .

[20]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[21]  David C. Schwartz,et al.  Single-Molecule Approach to Bacterial Genomic Comparisons via Optical Mapping , 2004, Journal of bacteriology.

[22]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[23]  H. Goossens,et al.  Whole genome mapping as a fast-track tool to assess genomic stability of sequenced Staphylococcus aureus strains , 2014, BMC Research Notes.

[24]  D. Schwartz,et al.  Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. , 1993, Science.

[25]  Manolis Kellis,et al.  Comparative Functional Genomics of the Fission Yeasts , 2011, Science.

[26]  Max A Seibold,et al.  Genome reference and sequence variation in the large repetitive central exon of human MUC5AC. , 2013, American journal of respiratory cell and molecular biology.

[27]  Leonid Kruglyak,et al.  Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae , 2009, Nature.

[28]  V. Passoth,et al.  The electrophoretic banding pattern of the chromosomes of Pichia stipitis and Candida shehatae , 1992, Current Genetics.

[29]  David C. Schwartz,et al.  High-resolution human genome structure by single-molecule analysis , 2010, Proceedings of the National Academy of Sciences.

[30]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[31]  T. C. White,et al.  The evolution of drug resistance in clinical isolates of Candida albicans , 2015, eLife.

[32]  Marie-Adèle Rajandream,et al.  Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans. , 2009, Genome research.

[33]  J. Miller,et al.  Whole-Genome Mapping: a New Paradigm in Strain-Typing Technology , 2013, Journal of Clinical Microbiology.

[34]  Miller Jm,et al.  Whole-Genome Mapping: a New Paradigm in Strain-Typing Technology , 2013 .

[35]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[36]  Juan J de Pablo,et al.  A microfluidic system for large DNA molecule arrays. , 2004, Analytical chemistry.

[37]  M. Gribskov,et al.  The Genome of Nectria haematococca: Contribution of Supernumerary Chromosomes to Gene Expansion , 2009, PLoS genetics.

[38]  G. Butler,et al.  Yeast genome evolution—the origin of the species , 2007, Yeast.

[39]  J. Piškur,et al.  Complex Nature of the Genome in a Wine Spoilage Yeast, Dekkera bruxellensis , 2009, Eukaryotic Cell.

[40]  Zhong Wang,et al.  ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies , 2013, Bioinform..

[41]  B. Mishra,et al.  Feature-by-Feature – Evaluating De Novo Sequence Assembly , 2012, PloS one.

[42]  Deepayan Sarkar,et al.  Single-molecule analysis reveals widespread structural variation in multiple myeloma , 2015, Proceedings of the National Academy of Sciences.

[43]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[44]  E. Triplett,et al.  Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission , 2010, Standards in genomic sciences.

[45]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[46]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[47]  B. Birren,et al.  Genome Project Standards in a New Era of Sequencing , 2009, Science.

[48]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[49]  David C. Schwartz,et al.  A Single Molecule Scaffold for the Maize Genome , 2009, PLoS genetics.

[50]  B. Dujon Yeast evolutionary genomics , 2010, Nature Reviews Genetics.

[51]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[52]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[53]  Robert P. Davey,et al.  Population genomics of domestic and wild yeasts , 2008, Nature.

[54]  Adrien Goëffon,et al.  Comparative genomics of protoploid Saccharomycetaceae. , 2009, Genome research.

[55]  Jeffrey E. Barrick,et al.  Large Chromosomal Rearrangements during a Long-Term Evolution Experiment with Escherichia coli , 2014, mBio.

[56]  David C. Schwartz,et al.  An algorithm for assembly of ordered restriction maps from single DNA molecules , 2006, Proceedings of the National Academy of Sciences.

[57]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.

[58]  J. Wendland,et al.  Genome Evolution in the Eremothecium Clade of the Saccharomyces Complex Revealed by Comparative Genomics , 2011, G3: Genes | Genomes | Genetics.

[59]  David C. Schwartz,et al.  Genomics via Optical Mapping II: Ordered Restriction Maps , 1997, J. Comput. Biol..

[60]  S. Imazio,et al.  Intraspecific variations of Dekkera/Brettanomyces bruxellensis genome studied by capillary electrophoresis separation of the intron splice site profiles. , 2012, International journal of food microbiology.