Single tube bead-based DNA co-barcoding for cost effective and accurate sequencing, haplotyping, and assembly

Single tube long fragment read (stLFR) technology enables efficient WGS, haplotyping, and contig scaffolding. It is based on adding the same barcode sequence to sub-fragments of the original DNA molecule (DNA co-barcoding). To achieve this, stLFR uses the surface of microbeads to create millions of miniaturized compartments in a single tube. Using a combinatorial process over 1.8 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding in reactions with 50 million barcodes. Using stLFR we demonstrate efficient unique co-barcoding of over 8 million 20300 kb genomic DNA fragments with near perfect variant calling and phasing of the genome of NA12878 into contigs up to N50 23.4 Mb. stLFR represents a low-cost single library solution that can enable long sequence data.

[1]  Hui Jiang,et al.  Identification of Balanced Chromosomal Rearrangements Previously Unknown Among Participants in the 1000 Genomes Project: Implications for Interpretation of Structural Variation in Genomes and the Future of Clinical Cytogenetics , 2017, Genetics in Medicine.

[2]  Andrew C. Adey,et al.  Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing , 2014, Nature Genetics.

[3]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[4]  Jessica A. Weber,et al.  The Sentieon Genomics Tools – A fast and accurate solution to variant calling from next-generation sequence data , 2017, bioRxiv.

[5]  Juan J de Pablo,et al.  Elongation and migration of single DNA molecules in microchannels using oscillatory shear flows. , 2009, Lab on a chip.

[6]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[7]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[8]  S. Oliver,et al.  Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes , 2017, GigaScience.

[9]  S. Turner,et al.  Zero-Mode Waveguides for Single-Molecule Analysis at High Concentrations , 2003, Science.

[10]  Robert B. Hartlage,et al.  This PDF file includes: Materials and Methods , 2009 .

[11]  C. Landry,et al.  Transcriptome sequences spanning key developmental states as a resource for the study of the cestode Schistocephalus solidus, a threespine stickleback parasite , 2016, GigaScience.

[12]  Mark Akeson,et al.  Replication of Individual DNA Molecules under Electronic Control Using a Protein Nanopore , 2010, Nature nanotechnology.

[13]  Alexander Wait Zaranek,et al.  The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes , 2016, GigaScience.

[14]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[15]  Hui Jiang,et al.  A reference human genome dataset of the BGISEQ-500 sequencer , 2017, GigaScience.

[16]  Dmitry Pushkarev,et al.  Whole-genome haplotyping using long reads and statistical methods , 2014, Nature Biotechnology.

[17]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[18]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[19]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[20]  Jay Shendure,et al.  Long-range polony haplotyping of individual human chromosome molecules , 2006, Nature Genetics.

[21]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[22]  Jun Zhang,et al.  Low-pass whole-genome sequencing in clinical cytogenetics: a validated approach , 2016, Genetics in Medicine.

[23]  Kui Zhang,et al.  Direct determination of molecular haplotypes by chromosome microdissection , 2010, Nature Methods.

[24]  Katja Nowick,et al.  A comprehensively molecular haplotype-resolved genome of a European individual. , 2011, Genome research.

[25]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[26]  Ou Wang,et al.  3’ Branch Ligation: A Novel Method to Ligate Non-Complementary DNA to Recessed or Internal 3’OH Ends in DNA or RNA , 2018, bioRxiv.

[27]  Radoje Drmanac,et al.  Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing , 2015, Front. Genet..

[28]  Vineet Bafna,et al.  HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies , 2017, Genome research.

[29]  Stephen R Quake,et al.  Whole-genome molecular haplotyping of single cells , 2011, Nature Biotechnology.

[30]  A. Alexeev,et al.  cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs , 2016, Clinical Epigenetics.

[31]  T. Sicheritz-Pontén,et al.  Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing , 2017, GigaScience.

[32]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[33]  Jay Shendure,et al.  Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube , 2017, Nature Biotechnology.

[34]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[35]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[36]  John G. Cleary,et al.  Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines , 2015, bioRxiv.

[37]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[38]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[39]  Bing Ren,et al.  Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing , 2013, Nature Biotechnology.

[40]  S. Koren,et al.  Scaffolding of long read assemblies using long range contact information , 2016, BMC Genomics.

[41]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[42]  K. Verstrepen,et al.  Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques , 2011, Nucleic acids research.