How complete are “complete” genome assemblies?—An avian perspective

The genomics revolution has led to the sequencing of a large variety of nonmodel organisms often referred to as “whole” or “complete” genome assemblies. But how complete are these, really? Here, we use birds as an example for nonmodel vertebrates and find that, although suitable in principle for genomic studies, the current standard of short‐read assemblies misses a significant proportion of the expected genome size (7% to 42%; mean 20 ± 9%). In particular, regions with strongly deviating nucleotide composition (e.g., guanine‐cytosine‐[GC]‐rich) and regions highly enriched in repetitive DNA (e.g., transposable elements and satellite DNA) are usually underrepresented in assemblies. However, long‐read sequencing technologies successfully characterize many of these underrepresented GC‐rich or repeat‐rich regions in several bird genomes. For instance, only ~2% of the expected total base pairs are missing in the last chicken reference (galGal5). These assemblies still contain thousands of gaps (i.e., fragmented sequences) because some chromosomal structures (e.g., centromeres) likely contain arrays of repetitive DNA that are too long to bridge with currently available technologies. We discuss how to minimize the number of assembly gaps by combining the latest available technologies with complementary strengths. At last, we emphasize the importance of knowing the location, size and potential content of assembly gaps when making population genetic inferences about adjacent genomic regions.

[1]  M. Batzer,et al.  Repetitive Elements May Comprise Over Two-Thirds of the Human Genome , 2011, PLoS genetics.

[2]  David Haussler,et al.  Linear assembly of a human centromere on the Y chromosome , 2018, Nature Biotechnology.

[3]  Morgan C. Giddings,et al.  Defining functional DNA elements in the human genome , 2014, Proceedings of the National Academy of Sciences.

[4]  Ashley M. Zehnder,et al.  Genetic Mapping and Biochemical Basis of Yellow Feather Pigmentation in Budgerigars , 2017, Cell.

[5]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[6]  Nicolas Galtier,et al.  Avian Genomes Revisited: Hidden Genes Uncovered and the Rates versus Traits Paradox in Birds , 2017, Molecular biology and evolution.

[7]  Diana Domanska,et al.  Mind your gaps: Overlooking assembly gaps confounds statistical testing in genome analysis , 2018 .

[8]  International Human Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004 .

[9]  Michael F. Seidl,et al.  Mind the gap; seven reasons to close fragmented genome assemblies. , 2016, Fungal genetics and biology : FG & B.

[10]  J. Doležel,et al.  Nuclear DNA content and genome size of trout and human. , 2003, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[11]  Steven G. Schroeder,et al.  Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome , 2017, Nature Genetics.

[12]  A. Alekseyenko,et al.  The Epigenome of Evolving Drosophila Neo-Sex Chromosomes: Dosage Compensation and Heterochromatin Formation , 2013, PLoS biology.

[13]  J. Doležel,et al.  Letter to the editor , 2003 .

[14]  E. Jarvis Perspectives from the Avian Phylogenomics Project: Questions that Can Be Answered with Sequencing All Genomes of a Vertebrate Class. , 2016, Annual review of animal biosciences.

[15]  Jonas Korlach,et al.  De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads , 2017, GigaScience.

[16]  D. Burt,et al.  Origin and evolution of avian microchromosomes , 2002, Cytogenetic and Genome Research.

[17]  Albert J. Vilella,et al.  Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis , 2010, PLoS biology.

[18]  Esther Lizano,et al.  Selective single molecule sequencing and assembly of a human Y chromosome of African origin , 2018, Nature Communications.

[19]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[20]  Albert J. Vilella,et al.  The genome of a songbird , 2010, Nature.

[21]  P. Kwok,et al.  Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly , 2012, Nature Biotechnology.

[22]  P. Bureš,et al.  Diverse retrotransposon families and an AT-rich satellite DNA revealed in giant genomes of Fritillaria lilies. , 2011, Annals of botany.

[23]  Miguel Gallach 1.688 g/cm3 satellite‐related repeats: a missing link to dosage compensation and speciation , 2015, Molecular ecology.

[24]  M. Batzer,et al.  The impact of retrotransposons on human genome evolution , 2009, Nature Reviews Genetics.

[25]  Hans H. Cheng,et al.  A New Chicken Genome Assembly Provides Insight into Avian Genome Structure , 2016, G3: Genes, Genomes, Genetics.

[26]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[27]  Nicolas Galtier,et al.  Illumina Library Preparation for Sequencing the GC-Rich Fraction of Heterogeneous Genomic DNA , 2018, Genome biology and evolution.

[28]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[29]  Fritz J Sedlazeck,et al.  Piercing the dark matter: bioinformatics of long-range sequencing and mapping , 2018, Nature Reviews Genetics.

[30]  Evan E. Eichler,et al.  Genetic variation and the de novo assembly of human genomes , 2015, Nature Reviews Genetics.

[31]  V. Meller,et al.  Satellite Repeats Identify X Chromatin for Dosage Compensation in Drosophila melanogaster Males , 2017, Current Biology.

[32]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[33]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[34]  Jan Pačes,et al.  Hidden genes in birds , 2015, Genome Biology.

[35]  G. K. Davis,et al.  Genome Sequence of the Pea Aphid Acyrthosiphon pisum , 2010, PLoS biology.

[36]  H. Ellegren,et al.  Making sense of genomic islands of differentiation in light of speciation , 2016, Nature Reviews Genetics.

[37]  Pall I. Olason,et al.  The genomic landscape of species divergence in Ficedula flycatchers , 2012, Nature.

[38]  Andreas R. Pfenning,et al.  Comparative genomics reveals insights into avian genome evolution and adaptation , 2014, Science.

[39]  A. Pang,et al.  Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications , 2017, Genome research.

[40]  Alexander Suh The phylogenomic forest of bird trees contains a hard polytomy at the root of Neoaves , 2016 .

[41]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[42]  Daisy E. Pagete An end-to-end assembly of the Aedes aegypti genome , 2016, 1605.04619.

[43]  Aurélie Kapusta,et al.  Evolution of bird genomes—a transposon's‐eye view , 2017, Annals of the New York Academy of Sciences.

[44]  M. Wikelski,et al.  The genomic landscape underlying phenotypic integrity in the face of gene flow in crows , 2014, Science.

[45]  Neva C. Durand,et al.  De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds , 2016, Science.

[46]  D. Ray,et al.  Evolution and Diversity of Transposable Elements in Vertebrate Genomes , 2017, Genome biology and evolution.

[47]  Eugene V. Koonin,et al.  The meaning of biological information , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[48]  J. Korlach,et al.  De novo assembly and phasing of a Korean human genome , 2016, Nature.

[49]  R. Crowhurst,et al.  Assembling large genomes: analysis of the stick insect (Clitarchus hookeri) genome reveals a high repeat content and sex-biased genes associated with reproduction , 2017, BMC Genomics.

[50]  Correspondence on Lovell et al.: identification of chicken genes previously assumed to be evolutionarily lost , 2017, Genome Biology.