Reconstructing complex regions of genomes using long-read sequencing technology

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.

[1]  J. D. Parsons,et al.  Miropeats: graphical DNA sequence comparisons , 1995, Comput. Appl. Biosci..

[2]  H. Stefánsson,et al.  A common inversion under selection in Europeans , 2005, Nature Genetics.

[3]  Zhaoshi Jiang,et al.  Evolutionary toggling of the MAPT 17q21.31 inversion region , 2008, Nature Genetics.

[4]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[5]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[6]  Andrew C. Adey,et al.  Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition , 2010, Genome Biology.

[7]  S. Turner,et al.  A flexible and efficient template format for circular consensus sequencing and SNP detection , 2010, Nucleic acids research.

[8]  Can Alkan,et al.  Genome-wide characterization of centromeric satellites from multiple mammalian genomes. , 2011, Genome research.

[9]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[10]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[11]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[12]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[13]  Peter H. Sudmant,et al.  Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication , 2012, Cell.

[14]  Kenneth K. Kidd,et al.  Structural Diversity and African Origin of the 17q21.31 Inversion Polymorphism , 2012, Nature Genetics.

[15]  Steve Rozen,et al.  Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes , 2012, Nature.

[16]  R. Gibbs,et al.  Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology , 2012, PloS one.

[17]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[18]  W. Wong,et al.  Improving PacBio Long Read Accuracy by Short Read Alignment , 2012, PloS one.

[19]  Anirvan Ghosh,et al.  Inhibition of SRGAP2 Function by Its Human-Specific Paralogs Induces Neoteny during Spine Maturation , 2012, Cell.

[20]  Andrew C. Adey,et al.  Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions , 2013, Nature Biotechnology.

[21]  Carl Baker,et al.  Evolution and diversity of copy number variation in the great ape lineage , 2013, Genome research.

[22]  Alicja Szabelska,et al.  Precise breakpoint localization of large genomic deletions using PacBio and Illumina next-generation sequencers. , 2013, BioTechniques.

[23]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.