A reference genome of the Chinese hamster based on a hybrid assembly strategy

Accurate and complete genome sequences are essential in biotechnology to facilitate genome‐based cell engineering efforts. The current genome assemblies for Cricetulus griseus, the Chinese hamster, are fragmented and replete with gap sequences and misassemblies, consistent with most short‐read‐based assemblies. Here, we completely resequenced C. griseus using single molecule real time sequencing and merged this with Illumina‐based assemblies. This generated a more contiguous and complete genome assembly than either technology alone, reducing the number of scaffolds by >28‐fold, with 90% of the sequence in the 122 longest scaffolds. Most genes are now found in single scaffolds, including up‐ and downstream regulatory elements, enabling improved study of noncoding regions. With >95% of the gap sequence filled, important Chinese hamster ovary cell mutations have been detected in draft assembly gaps. This new assembly will be an invaluable resource for continued basic and pharmaceutical research.

[1]  T. Puck,et al.  GENETICS OF SOMATIC MAMMALIAN CELLS : II. CHROMOSOMAL CONSTITUTION OF CELLS IN TISSUE CULTURE , 1958 .

[2]  D. Wheatley Pericentriolar virus-like particles in Chinese hamster ovary cells. , 1974, The Journal of general virology.

[3]  J. Esko,et al.  Animal cell mutants defective in glycosaminoglycan biosynthesis. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[4]  K. Brookman,et al.  Recent Progress with the DNA Repair Mutants of Chinese Hamster Ovary Cells , 1986, Journal of Cell Science.

[5]  G A Keller,et al.  Endogenous origin of defective retroviruslike particles from a recombinant Chinese hamster ovary cell line. , 1991, Virology.

[6]  W. Dowhan,et al.  Isolation of a Chinese Hamster Ovary (CHO) cDNA Encoding Phosphatidylglycerophosphate (PGP) Synthase, Expression of Which Corrects the Mitochondrial Abnormalities of a PGP Synthase-defective Mutant of CHO-K1 Cells* , 1999, The Journal of Biological Chemistry.

[7]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[8]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[9]  M. Ferguson-Smith,et al.  Comparative Chromosome Map of the Laboratory Mouse and Chinese Hamster Defined by Reciprocal Chromosome Painting , 2004, Chromosome Research.

[10]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[11]  ゴル、ギルベルト,et al.  Or improvements to that in protein production , 2006 .

[12]  Y. Maeda,et al.  CHO glycosylation mutants: GPI anchor. , 2006, Methods in enzymology.

[13]  P. Stanley,et al.  Lectin-resistant CHO glycosylation mutants. , 2006, Methods in enzymology.

[14]  J. Esko,et al.  CHO glycosylation mutants: proteoglycans. , 2006, Methods in enzymology.

[15]  Katie F Wlaschin,et al.  A scaffold for the Chinese hamster genome , 2007, Biotechnology and bioengineering.

[16]  M. Hinsdale,et al.  Biosynthesis of Chondroitin and Heparan Sulfate in Chinese Hamster Ovary Cells Depends on Xylosyltransferase II* , 2007, Journal of Biological Chemistry.

[17]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[18]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[19]  Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes , 2009, BMC Bioinformatics.

[20]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[21]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[22]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[23]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[24]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[25]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[26]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[27]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[28]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[29]  Brent S. Pedersen,et al.  Pybedtools: a flexible Python library for manipulating genomic datasets and annotations , 2011, Bioinform..

[30]  Kelvin H. Lee,et al.  The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line , 2011, Nature Biotechnology.

[31]  Martin Kollmar,et al.  A novel hybrid gene prediction method employing protein multiple sequence alignments , 2011, Bioinform..

[32]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[33]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[34]  Edward J. O'Brien,et al.  Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome , 2013, Nature Biotechnology.

[35]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[36]  Nathan E Lewis,et al.  The emerging CHO systems biology era: harnessing the 'omics revolution for biotechnology. , 2013, Current opinion in biotechnology.

[37]  Jianying Yuan,et al.  Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects , 2013, 1308.2012.

[38]  Andreas Tauch,et al.  Chinese hamster genome sequenced from sorted chromosomes , 2013, Nature Biotechnology.

[39]  David P. Kreil,et al.  CHO microRNA engineering is growing up: Recent successes and future challenges☆ , 2013, Biotechnology advances.

[40]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[41]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[42]  M. Yandell,et al.  Genome Annotation and Curation Using MAKER and MAKER‐P , 2014, Current protocols in bioinformatics.

[43]  Leena Salmela,et al.  LoRDEC: accurate and efficient long read error correction , 2014, Bioinform..

[44]  Gary Walsh,et al.  Biopharmaceutical benchmarks 2014 , 2014, Nature Biotechnology.

[45]  P. Stanley Chinese hamster ovary mutants for glycosylation engineering of biopharmaceuticals , 2014 .

[46]  Yuanxing Zhang,et al.  Producing recombinant therapeutic glycoproteins with enhanced sialylation using CHO-gmt4 glycosylation mutant cells , 2014, Bioengineered.

[47]  Thomas Hackl,et al.  proovread: large-scale high-accuracy PacBio correction through iterative short read consensus , 2014, Bioinform..

[48]  N. Lewis,et al.  CRISPR/Cas9‐mediated genome engineering of CHO cell factories: Application and perspectives , 2015, Biotechnology journal.

[49]  G. Church,et al.  Supplementary Materials for Genome-wide inactivation of porcine endogenous retroviruses ( PERVs ) , 2015 .

[50]  M. Schatz,et al.  Metassembler: merging and optimizing de novo genome assemblies , 2015, Genome Biology.

[51]  Lasse Ebdrup Pedersen,et al.  Site-specific integration in CHO cells mediated by CRISPR/Cas9 and homology-directed DNA repair pathway , 2015, Scientific Reports.

[52]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[53]  Kenichiro Hata,et al.  Targeted DNA demethylation in vivo using dCas9–peptide repeat and scFv–TET1 catalytic domain fusions , 2016, Nature Biotechnology.

[54]  E. Eichler,et al.  Long-read sequencing and de novo assembly of a Chinese genome , 2016, Nature Communications.

[55]  Vanja Tadić,et al.  Repurposing the CRISPR-Cas9 system for targeted DNA methylation , 2016, Nucleic acids research.

[56]  David Haussler,et al.  Long-read sequence assembly of the gorilla genome , 2016, Science.

[57]  Vaibhav Jadhav,et al.  Comprehensive genome and epigenome characterization of CHO cells in response to evolutionary pressures and over time , 2016, Biotechnology and bioengineering.

[58]  R. Wing,et al.  Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data , 2016, Scientific Data.

[59]  Xander M R van Wijk,et al.  Whole-Genome Sequencing of Invasion-Resistant Cells Identifies Laminin α2 as a Host Factor for Bacterial Invasion , 2017, mBio.

[60]  Steven G. Schroeder,et al.  Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome , 2017, Nature Genetics.

[61]  Kevin L. Schneider,et al.  Improved maize reference genome with single-molecule technologies , 2017, Nature.

[62]  Niranjan Nagarajan,et al.  Mammalian Systems Biotechnology Reveals Global Cellular Adaptations in a Recombinant CHO Cell Line. , 2017, Cell systems.

[63]  N. Lewis,et al.  Improvements in protein production in mammalian cells from targeted metabolic engineering. , 2017, Current opinion in systems biology.

[64]  Jeffrey Ross-Ibarra,et al.  Improved maize reference genome with single-molecule technologies , 2017, Nature.

[65]  Vaibhav Jadhav,et al.  Enhanced Genome Editing Tools For Multi-Gene Deletion Knock-Out Approaches Using Paired CRISPR sgRNAs in CHO Cells. , 2018, Biotechnology journal.