High throughput barcoding method for genome-scale phasing

The future of human genomics is one that seeks to resolve the entirety of genetic variation through sequencing. The prospect of utilizing genomics for medical purposes require cost-efficient and accurate base calling, long-range haplotyping capability, and reliable calling of structural variants. Short-read sequencing has lead the development towards such a future but has struggled to meet the latter two of these needs. To address this limitation, we developed a technology that preserves the molecular origin of short sequencing reads, with an insignificant increase to sequencing costs. We demonstrate a novel library preparation method for high throughput barcoding of short reads where millions of random barcodes can be used to reconstruct megabase-scale phase blocks.

[1]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[2]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[3]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[4]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[5]  H. Bayley,et al.  Continuous base identification for single-molecule nanopore DNA sequencing. , 2009, Nature nanotechnology.

[6]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[7]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[8]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[9]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[10]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[11]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[12]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[13]  M. Käller,et al.  Droplet Barcode Sequencing for targeted linked-read haplotyping of single DNA molecules , 2017, Nucleic acids research.

[14]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[15]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[16]  F. Alemán The Necessity of Diploid Genome Sequencing to Unravel the Genetic Component of Complex Phenotypes , 2017, Front. Genet..

[17]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[18]  Serafim Batzoglou,et al.  High-quality genome sequences of uncultured microbes by assembly of read clouds , 2018, Nature Biotechnology.

[19]  Anders F. Andersson,et al.  Phasing of single DNA molecules by massively parallel barcoding , 2015, Nature Communications.

[20]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[21]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[22]  Andrew C. Adey,et al.  Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing , 2014, Nature Genetics.

[23]  Justin M. Zook Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015 .

[24]  Richard Durbin,et al.  Extending reference assembly models , 2015, Genome Biology.

[25]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[26]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[27]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[28]  Jonas Korlach,et al.  Discovery and genotyping of structural variation from long-read haploid genome sequence data , 2017, Genome research.

[29]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[30]  John Huddleston,et al.  An Incomplete Understanding of Human Genetic Variation , 2016, Genetics.

[31]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[32]  Xin Li,et al.  The impact of structural variation on human gene expression , 2016, Nature Genetics.

[33]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, bioRxiv.

[34]  Modesto Orozco,et al.  Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads , 2014, Nature Biotechnology.

[35]  John R. Haliburton,et al.  Droplet barcoding for massively parallel single-molecule deep sequencing , 2016, Nature Communications.

[36]  Jay Shendure,et al.  Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube , 2017, Nature Biotechnology.

[37]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[38]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[39]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[40]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.