Improvement of the threespine stickleback (Gasterosteus aculeatus) genome using a Hi-C-based Proximity-Guided Assembly method

Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity guided assembly to perform a de novo genome assembly from relatively short contigs. Using Hi-C based proximity guided assembly, we generated complete chromosome assemblies from 50 kb contigs. We found that 98.99% of contigs were correctly assigned to linkage groups, with ordering nearly identical to the previous genome assembly. Using available BAC end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the two assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (∼21.7 Mb) contigs to linkage groups. Together, our results highlight the potential of the Hi-C based proximity guided assembly method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods.

[1]  Alex A. Pollen,et al.  The genomic basis of adaptive evolution in threespine sticklebacks , 2012, Nature.

[2]  Antoine Margeot,et al.  High-quality genome (re)assembly using chromosomal contact data , 2014, Nature Communications.

[3]  Nic Herndon,et al.  Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool , 2015, BMC Genomics.

[4]  Xun Xu,et al.  The genome of Prunus mume , 2012, Nature Communications.

[5]  D. Kingsley,et al.  The Molecular Geneticsof Evolutionary Changein Sticklebacks , 2006 .

[6]  Susan A. Foster,et al.  The Evolutionary biology of the threespine stickleback , 1995 .

[7]  Han Cao,et al.  Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes , 2010, Nucleic acids research.

[8]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[9]  S. Foster,et al.  The Evolutionary biology of the threespine stickleback , 1995 .

[10]  Janna L. Fierst,et al.  Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools , 2015, Front. Genet..

[11]  D. Rokhsar,et al.  Genome Assembly Improvement and Mapping Convergently Evolved Skeletal Traits in Sticklebacks with Genotyping-by-Sequencing , 2015, G3: Genes, Genomes, Genetics.

[12]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[13]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[14]  Timothy P. L. Smith,et al.  Single-molecule sequencing and conformational capture enable de novo mammalian reference genomes , 2016, bioRxiv.

[15]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[16]  Andrew C. Adey,et al.  Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions , 2013, Nature Biotechnology.

[17]  David Haussler,et al.  Long-read sequence assembly of the gorilla genome , 2016, Science.

[18]  C. Peichel,et al.  The genetic and molecular architecture of phenotypic diversity in sticklebacks , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[20]  David C. Schwartz,et al.  High-resolution human genome structure by single-molecule analysis , 2010, Proceedings of the National Academy of Sciences.

[21]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[22]  J. Wolf,et al.  A field guide to whole-genome sequencing, assembly and annotation , 2014, Evolutionary applications.

[23]  Deacon J. Sweeney,et al.  Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus) , 2012, Nature Biotechnology.

[24]  P. Nosil,et al.  Stickleback research: the now and the next , 2013 .

[25]  Marius Roesti,et al.  Recombination in the threespine stickleback genome—patterns and consequences , 2013, Molecular ecology.

[26]  F. Huntingford,et al.  Biology of the Three-Spined Stickleback , 2006 .

[27]  D. Absher,et al.  Genetic Architecture of Variation in the Lateral Line Sensory System of Threespine Sticklebacks , 2012, G3: Genes | Genomes | Genetics.

[28]  Brendan L. O’Connell,et al.  Chromosome-scale shotgun assembly using an in vitro method for long-range linkage , 2015, Genome research.

[29]  C. Amemiya,et al.  New genomic tools for molecular studies of evolutionary change in threespine sticklebacks , 2004 .

[30]  Noam Kaplan,et al.  High-throughput genome scaffolding from in-vivo DNA interaction frequency , 2013, Nature Biotechnology.

[31]  Ian Mayer,et al.  The Molecular Genetics of Evolutionary Change in Sticklebacks , 2006 .

[32]  E. Eichler,et al.  Long-read sequencing and de novo assembly of a Chinese genome , 2016, Nature Communications.