Recompleting the Caenorhabditis elegans genome

Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted ≥53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology.

[1]  Haibao Tang,et al.  Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum , 2015, Nature.

[2]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[3]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[5]  Kimberly Van Auken,et al.  WormBase 2017: molting into a new stage , 2017, Nucleic Acids Res..

[6]  Jose Lugo-Martinez,et al.  Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies , 2014, PLoS Comput. Biol..

[7]  S. Raffaele,et al.  Genome evolution in filamentous plant pathogens: why bigger can be better , 2012, Nature Reviews Microbiology.

[8]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[9]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[10]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[11]  Mick Watson,et al.  Successful test launch for nanopore sequencing , 2015, Nature Methods.

[12]  Gordon Gremme,et al.  GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[14]  I. Chung,et al.  Sequence-specific Binding to Telomeric DNA by CEH-37, a Homeodomain Protein in the Nematode Caenorhabditis elegans* , 2003, Journal of Biological Chemistry.

[15]  M. Blasco,et al.  Developmentally regulated transcription of mammalian telomeres by DNA-dependent RNA polymerase II , 2008, Nature Cell Biology.

[16]  Eleanor Young,et al.  High-throughput single-molecule telomere characterization , 2017, Genome research.

[17]  David Haussler,et al.  Linear assembly of a human centromere on the Y chromosome , 2018, Nature Biotechnology.

[18]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[19]  David Haussler,et al.  High-resolution comparative analysis of great ape genomes , 2018, Science.

[20]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[21]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[22]  Evan E. Eichler,et al.  Long-read sequence and assembly of segmental duplications , 2018, Nature Methods.

[23]  Daniel E. Cook,et al.  CeNDR, the Caenorhabditis elegans natural diversity resource , 2016, Nucleic Acids Res..

[24]  Michael C. Schatz,et al.  Ribbon: Visualizing complex genome alignments and structural variation , 2016, bioRxiv.

[25]  L. B. Snoek,et al.  The laboratory domestication of Caenorhabditis elegans. , 2015, Trends in genetics : TIG.

[26]  A. Coulson,et al.  The physical map of the Caenorhabditis elegans genome. , 1995, Methods in cell biology.

[27]  Nicholas W. VanKuren,et al.  Hidden genetic variation shapes the structure of functional elements in Drosophila , 2017, Nature Genetics.

[28]  J. A. Subirana,et al.  A Satellite Explosion in the Genome of Holocentric Nematodes , 2013, PloS one.

[29]  Daniel Lawson,et al.  Overview of gene structure. , 2006, WormBook : the online review of C. elegans biology.

[30]  Sara Goodwin,et al.  Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome , 2015, bioRxiv.

[31]  Idan Gabdank,et al.  Intricate and Cell Type-Specific Populations of Endogenous Circular DNA (eccDNA) in Caenorhabditis elegans and Homo sapiens , 2017, G3: Genes, Genomes, Genetics.

[32]  Zhongying Zhao,et al.  Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome , 2015, Scientific Reports.

[33]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[34]  A. Coulson,et al.  The rDNA of C. elegans: sequence and structure. , 1986, Nucleic acids research.

[35]  A. Villeneuve,et al.  Telomeric repeats (TTAGGC)n are sufficient for chromosome capping function in Caenorhabditis elegans. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Steven M. Johnson,et al.  Flexibility and constraint in the nucleosome core landscape of Caenorhabditis elegans chromatin. , 2006, Genome research.

[37]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[38]  A. Lupas Prediction and analysis of coiled-coil structures. , 1996, Methods in enzymology.

[39]  A. Coulson,et al.  Genome linking with yeast artificial chromosomes , 1988, Nature.

[40]  M. Freitag,et al.  Evolving Centromeres and Kinetochores. , 2017, Advances in genetics.

[41]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[42]  A. Coulson,et al.  Genomics in C. elegans: so many genes, such a little worm. , 2005, Genome research.

[43]  D. Bird,et al.  The plant parasite Pratylenchus coffeae carries a minimal nematode genome , 2015 .

[44]  O. Kallioniemi,et al.  FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data , 2014, bioRxiv.

[45]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[46]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[47]  Jonas Korlach,et al.  Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures , 2008, Proceedings of the National Academy of Sciences.

[48]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[49]  Thomas Rattei,et al.  Gepard: a rapid and sensitive tool for creating dotplots on genome scale , 2007, Bioinform..

[50]  Koichiro Doi,et al.  Centromere evolution and CpG methylation during vertebrate speciation , 2017, Nature Communications.

[51]  Eugene W. Myers,et al.  Efficient Local Alignment Discovery amongst Noisy Long Reads , 2014, WABI.

[52]  D. Schwartz,et al.  Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis , 1984, Cell.

[53]  B. M. Honda,et al.  Genes coding for 5S ribosomal RNA of the nematode Caenorhabditis elegans. , 1985, Gene.

[54]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[55]  Sean R Eddy,et al.  C. elegans noncoding RNA genes. , 2005, WormBook : the online review of C. elegans biology.

[56]  Ilan Shomorony,et al.  HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution , 2016, bioRxiv.

[57]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[58]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[59]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[60]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[61]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[62]  M. Lynch The frailty of adaptive hypotheses for the origins of organismal complexity , 2007, Proceedings of the National Academy of Sciences.

[63]  A. Coulson,et al.  YACs and the C. elegans genome. , 1991, BioEssays : news and reviews in molecular, cellular and developmental biology.

[64]  D. Riddle,et al.  Defining wild-type life span in Caenorhabditis elegans. , 2000, The journals of gerontology. Series A, Biological sciences and medical sciences.

[65]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[66]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[67]  S. Turner,et al.  Zero-Mode Waveguides for Single-Molecule Analysis at High Concentrations , 2003, Science.

[68]  Steven J. M. Jones,et al.  Whole-Genome Profiling of Mutagenesis in Caenorhabditis elegans , 2010, Genetics.

[69]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[70]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[71]  Brian T. Lee,et al.  The UCSC Genome Browser database: 2015 update , 2014, Nucleic Acids Res..

[72]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[73]  Jim C. Huang,et al.  Polymorphic segmental duplication in the nematode Caenorhabditis elegans , 2009, BMC Genomics.

[74]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[75]  Aaron R. Quinlan,et al.  Poretools: a toolkit for analyzing nanopore sequence data , 2014, bioRxiv.

[76]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[77]  Danny E. Miller,et al.  Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing , 2018, G3: Genes, Genomes, Genetics.

[78]  Michael Hiller,et al.  Author Correction: The axolotl genome and the evolution of key tissue formation regulators , 2018, Nature.

[79]  Richard S. Sandstrom,et al.  BEDOPS: high-performance genomic feature operations , 2012, Bioinform..

[80]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[81]  Yizhi Cai,et al.  Design of a synthetic yeast genome , 2017, Science.

[82]  W. Stephan,et al.  The evolution of restricted recombination and the accumulation of repeated DNA sequences. , 1986, Genetics.

[83]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[84]  David A. Eccles,et al.  De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads , 2018, BMC Biology.

[85]  D. Mead,et al.  Linear plasmid vector for cloning of repetitive or unstable sequences in Escherichia coli , 2009, Nucleic acids research.

[86]  M. Félix,et al.  C. elegans outside the Petri dish , 2015, eLife.

[87]  J. Bessereau,et al.  [C. elegans: of neurons and genes]. , 2003, Medecine sciences : M/S.

[88]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[89]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[90]  Richard J. Poole,et al.  Next-Generation Sequencing-Based Approaches for Mutation Mapping and Identification in Caenorhabditis elegans , 2016, Genetics.

[91]  David Haussler,et al.  Long-read sequence assembly of the gorilla genome , 2016, Science.

[92]  Masahiro Kasahara Large-scale Genome Sequence Processing , 2006 .

[93]  J. Sulston,et al.  The DNA of Caenorhabditis elegans. , 1974, Genetics.

[94]  A. Larracuente,et al.  Heterochromatin-Enriched Assemblies Reveal the Sequence and Organization of the Drosophila melanogaster Y Chromosome , 2018, Genetics.

[95]  L. Kruglyak,et al.  Recombinational Landscape and Population Genomics of Caenorhabditis elegans , 2009, PLoS genetics.

[96]  Cristel G. Thomas,et al.  Rapid genome shrinkage in a self-fertile nematode reveals sperm competition proteins , 2018, Science.

[97]  D. Branton,et al.  Three decades of nanopore sequencing , 2016, Nature Biotechnology.

[98]  J. Bessereau Transposons in C. elegans. , 2006, WormBook : the online review of C. elegans biology.

[99]  W. Gish,et al.  Rapid gene mapping in Caenorhabditis elegans using a high density polymorphism map , 2001, Nature Genetics.

[100]  A. Fire,et al.  Distributed probing of chromatin structure in vivo reveals pervasive chromatin accessibility for expressed and non-expressed genes during tissue differentiation in C. elegans , 2010, BMC Genomics.

[101]  Pasi K. Korhonen,et al.  Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. , 2016, Biotechnology advances.

[102]  John R Tyson,et al.  MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome , 2018, Genome research.

[103]  C. Azzalin,et al.  Telomeric Repeat–Containing RNA and RNA Surveillance Factors at Mammalian Chromosome Ends , 2007, Science.