Dog10K_Boxer_Tasha_1.0: A Long-Read Assembly of the Dog Reference Genome

The domestic dog has evolved to be an important biomedical model for studies regarding the genetic basis of disease, morphology and behavior. Genetic studies in the dog have relied on a draft reference genome of a purebred female boxer dog named “Tasha” initially published in 2005. Derived from a Sanger whole genome shotgun sequencing approach coupled with limited clone-based sequencing, the initial assembly and subsequent updates have served as the predominant resource for canine genetics for 15 years. While the initial assembly produced a good quality draft, as with all assemblies produced at the time it contained gaps, assembly errors and missing sequences, particularly in GC-rich regions, which are found at many promoters and in the first exons of protein coding genes. Here we present Dog10K_Boxer_Tasha_1.0, an improved chromosome-level highly contiguous genome assembly of Tasha created with long-read technologies, that increases sequence contiguity >100-fold, closes >23,000 gaps of the Canfam3.1 reference assembly and improves gene annotation by identifying >1200 new protein-coding transcripts. The assembly and annotation are available at NCBI under the accession GCF_000002285.5.

[1]  Richard J. Edwards,et al.  Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome , 2020, bioRxiv.

[2]  K. Lindblad-Toh,et al.  A novel canine reference genome resolves genomic architecture and uncovers transcript complexity , 2020, Communications biology.

[3]  J. V. Moran,et al.  Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes , 2020, Proceedings of the National Academy of Sciences.

[4]  Richard J. Edwards,et al.  Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C , 2020, GigaScience.

[5]  C. Wade,et al.  A comprehensive biomedical variant catalogue based on whole genome sequences of 582 dogs and eight wolves. , 2019, Animal genetics.

[6]  O. Gokcumen,et al.  Independent amylase gene copy number bursts correlate with dietary preferences in mammals , 2019, eLife.

[7]  H. Schneider,et al.  Prenatal Treatment of X-Linked Hypohidrotic Ectodermal Dysplasia Using Recombinant Ectodysplasin in a Canine Model , 2019, The Journal of Pharmacology and Experimental Therapeutics.

[8]  Brian W Davis,et al.  Whole genome sequencing of canids reveals genomic regions under selection and variants influencing morphology , 2019, Nature Communications.

[9]  Heng Li,et al.  Fast and accurate long-read assembly with wtdbg2 , 2019, Nature Methods.

[10]  Xingang Wang,et al.  RaGOO: fast and accurate reference-guided scaffolding of draft genomes , 2019, Genome Biology.

[11]  Faraz Hach,et al.  Fast characterization of segmental duplications in genome assemblies , 2018, Bioinform..

[12]  K. Veeramah,et al.  Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication , 2018, BMC Biology.

[13]  Steven J. M. Jones,et al.  Tigmint: correcting assembly errors using linked reads from large molecules , 2018, bioRxiv.

[14]  J. Novembre,et al.  Similar genomic proportions of copy number variation within gray wolves and modern dog breeds inferred from whole genome sequencing , 2017, BMC Genomics.

[15]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[16]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[17]  J. Kornegay The golden retriever model of Duchenne muscular dystrophy , 2017, Skeletal Muscle.

[18]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[19]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[20]  Adam Auton,et al.  A Pedigree-Based Map of Recombination in the Domestic Dog Genome , 2016, G3: Genes, Genomes, Genetics.

[21]  Matthew Oetjens,et al.  Ancient European dog genomes reveal continuity since the Early Neolithic , 2016, Nature Communications.

[22]  Lars Arvestad,et al.  Assembly scaffolding with PE-contaminated mate-pair libraries , 2016, Bioinform..

[23]  Erik Axelsson,et al.  Amy2B copy number variation reveals starch diet adaptations in ancient European dogs , 2016, Royal Society Open Science.

[24]  T. Capellini,et al.  Dietary Variation and Evolution of Gene Copy Number among Dog Breeds , 2016, PloS one.

[25]  Robert D. Finn,et al.  The Dfam database of repetitive DNA families , 2015, Nucleic Acids Res..

[26]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[27]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[28]  K. Lindblad-Toh,et al.  Amylase activity is associated with AMY2B copy numbers in dog: implications for dog domestication, diet and diabetes , 2014, Animal genetics.

[29]  C. Lalueza-Fox,et al.  Analysis of structural diversity in wolf-like canids reveals post-domestication variants , 2014, BMC Genomics.

[30]  Ilan Gronau,et al.  Genome Sequencing Highlights the Dynamic Early History of Dogs , 2014, PLoS genetics.

[31]  K. Lindblad-Toh,et al.  The genomic signature of dog domestication reveals adaptation to a starch-rich diet , 2013, Nature.

[32]  R. Gibbs,et al.  Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology , 2012, PloS one.

[33]  David Haussler,et al.  The UCSC genome browser and associated tools , 2012, Briefings Bioinform..

[34]  F. Galibert,et al.  PNPLA1 mutations cause autosomal recessive congenital ichthyosis in golden retriever dogs and humans , 2012, Nature Genetics.

[35]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[36]  M. Neff,et al.  A Comprehensive Linkage Map of the Dog Genome , 2010, Genetics.

[37]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[38]  E. Eichler,et al.  Systematic assessment of copy number variant detection via genome-wide SNP genotyping , 2008, Nature Genetics.

[39]  G. Acland,et al.  Identical mutation in a novel retinal gene causes progressive rod-cone degeneration in dogs and retinitis pigmentosa in humans. , 2006, Genomics.

[40]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[41]  E. Kirkness,et al.  Short interspersed elements (SINEs) are a major source of canine genomic diversity. , 2005, Genome research.

[42]  M. Summers,et al.  How retroviruses select their genomes , 2005, Nature Reviews Microbiology.

[43]  K. Lindblad-Toh,et al.  Facilitating genome navigation: survey sequencing and dense radiation-hybrid gene mapping , 2005, Nature Reviews Genetics.

[44]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[45]  E. Birney,et al.  Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. , 2004, Genome research.

[46]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[47]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[48]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[49]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[50]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[51]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[52]  Mosè Manni,et al.  BUSCO: Assessing Genome Assembly and Annotation Completeness. , 2019, Methods in molecular biology.

[53]  J. V. Moran,et al.  The Influence of LINE-1 and SINE Retrotransposons on Mammalian Genomes , 2015, Microbiology spectrum.