Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations

Current de novo whole-genome sequencing approaches often are inadequate for organisms lacking substantial preexisting genetic data. Problems with these methods are manifest as: large numbers of scaffolds that are not ordered within chromosomes or assigned to individual chromosomes, misassembly of allelic sequences as separate loci when the individual(s) being sequenced are heterozygous, and the collapse of recently duplicated sequences into a single locus, regardless of levels of heterozygosity. Here we propose a new approach for producing de novo whole-genome sequences—which we call recombinant population genome construction—that solves many of the problems encountered in standard genome assembly and that can be applied in model and nonmodel organisms. Our approach takes advantage of next-generation sequencing technologies to simultaneously barcode and sequence a large number of individuals from a recombinant population. The sequences of all recombinants can be combined to create an initial de novo assembly, followed by the use of individual recombinant genotypes to correct assembly splitting/collapsing and to order and orient scaffolds within linkage groups. Recombinant population genome construction can rapidly accelerate the transformation of nonmodel species into genome-enabled systems by simultaneously producing a high-quality genome assembly and providing genomic tools (e.g., high-confidence single-nucleotide polymorphisms) for immediate applications. In populations segregating for important functional traits, this approach also enables simultaneous mapping of quantitative trait loci. We demonstrate our method using simulated Illumina data from a recombinant population of Caenorhabditis elegans and show that the method can produce a high-fidelity, high-quality genome assembly for both parents of the cross.

[1]  Alkes L. Price,et al.  Using population admixture to help complete maps of the human genome , 2013, Nature Genetics.

[2]  Colin N. Dewey,et al.  Genomic Variation in Natural Populations of Drosophila melanogaster , 2012, Genetics.

[3]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[4]  Kevin R. Thornton,et al.  The Drosophila melanogaster Genetic Reference Panel , 2012, Nature.

[5]  M. Yano,et al.  Relationship between transmission ratio distortion and genetic divergence in intraspecific rice crosses , 2011, Molecular Genetics and Genomics.

[6]  Tom Hsiang,et al.  A biologist's guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes. , 2011, Journal of microbiological methods.

[7]  Hui Shen,et al.  Comparative studies of de novo assembly tools for next-generation sequencing technologies , 2011, Bioinform..

[8]  A. Amores,et al.  Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication , 2011, Genetics.

[9]  D. Koboldt,et al.  Caenorhabditis briggsae Recombinant Inbred Line Genotypes Reveal Inter-Strain Incompatibility and the Evolution of Recombination , 2011, PLoS genetics.

[10]  Todd H. Oakley,et al.  The Ecoresponsive Genome of Daphnia pulex , 2011, Science.

[11]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[12]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[13]  Makedonka Mitreva,et al.  A vertebrate case study of the quality of assemblies derived from next-generation sequences , 2011, Genome Biology.

[14]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[15]  Frédéric Delsuc,et al.  Plasticity of Animal Genome Architecture Unmasked by Rapid Evolution of a Pelagic Tunicate , 2010, Science.

[16]  Ali Mortazavi,et al.  Scaffolding a Caenorhabditis nematode genome with RNA-seq. , 2010, Genome research.

[17]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[18]  M. Schatz,et al.  Assembly of large genomes using second-generation sequencing. , 2010, Genome research.

[19]  Qi Feng,et al.  Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing , 2010, Proceedings of the National Academy of Sciences.

[20]  C. Casola,et al.  Nonallelic Gene Conversion in the Genus Drosophila , 2010, Genetics.

[21]  Detlef Weigel,et al.  Next Generation Molecular Ecology , 2010, Molecular ecology.

[22]  Steven L Salzberg,et al.  Detection and correction of false segmental duplications caused by genome mis-assembly , 2010, Genome Biology.

[23]  Steven B Cannon,et al.  High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence , 2010, BMC Genomics.

[24]  Andrew G. Clark,et al.  Population Genomic Inferences from Sparse High-Throughput Sequencing of Two Populations of Drosophila melanogaster , 2009, Genome biology and evolution.

[25]  Stephen J O'Brien,et al.  Every genome sequence needs a good map. , 2009, Genome research.

[26]  David C. Schwartz,et al.  A Single Molecule Scaffold for the Maize Genome , 2009, PLoS genetics.

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  Xuehui Huang,et al.  High-throughput genotyping by whole-genome resequencing. , 2009, Genome research.

[29]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[30]  Mira V. Han,et al.  Adaptive evolution of young gene duplicates in mammals. , 2009, Genome research.

[31]  Cristel G. Thomas,et al.  Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes. , 2008, Genome research.

[32]  P. Etter,et al.  Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers , 2008, PloS one.

[33]  Stefano Lonardi,et al.  Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph , 2008, PLoS genetics.

[34]  A. Halpern,et al.  An MCMC algorithm for haplotype assembly from whole-genome sequence data. , 2008, Genome research.

[35]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[36]  Michael S Waterman,et al.  Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. , 2007, Genome research.

[37]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[38]  L. Moyle,et al.  Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Genome-wide associations between hybrid sterility QTL and marker transmission ratio distortion. , 2006, Molecular biology and evolution.

[39]  B. Payseur,et al.  Signatures of Reproductive Isolation in Patterns of Single Nucleotide Diversity Across Inbred Strains of Mice , 2005, Genetics.

[40]  J. Willis,et al.  Transmission Ratio Distortion in Intraspecific Hybrids of Mimulus guttatus , 2005, Genetics.

[41]  J. Cornuet,et al.  A Microsatellite-Based Linkage Map of the Honeybee, Apis mellifera L. , 2004, Genetics.

[42]  R. Sederoff,et al.  Genetics of Postzygotic Isolation in Eucalyptus: Whole-Genome Analysis of Barriers to Introgression in a Wide Interspecific Cross of Eucalyptus grandis and E. globulus , 2004, Genetics.

[43]  Garth R. Brown,et al.  Comparative genome and QTL mapping between maritime and loblolly pines , 2003, Molecular Breeding.

[44]  J. R. MacDonald,et al.  Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence , 2003, Genome Biology.

[45]  Paul Richardson,et al.  The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins , 2002, Science.

[46]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[47]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[48]  W. Gish,et al.  Rapid gene mapping in Caenorhabditis elegans using a high density polymorphism map , 2001, Nature Genetics.

[49]  B. Trask,et al.  Segmental duplications: organization and impact within the current human genome project assembly. , 2001, Genome research.

[50]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[51]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[52]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[53]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[54]  R. Sederoff,et al.  Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. , 1994, Genetics.

[55]  J. Carlson,et al.  Single Tree Genetic Linkage Mapping in Conifers Using Haploid DNA from Megagametophytes , 1992, Bio/Technology.

[56]  D. Zamir,et al.  Unequal Segregation of Nuclear Genes in Plants , 1986, Botanical Gazette.