Characterization of structural variants with single molecule and hybrid sequencing approaches

MOTIVATION Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates. RESULTS We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly.

[1]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[2]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[3]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[4]  Tae-Min Kim,et al.  Detecting structural variations in the human genome using next generation sequencing. , 2010, Briefings in functional genomics.

[5]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[6]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[7]  Ryan M. Layer,et al.  Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms , 2013, Genome research.

[8]  E. Eichler,et al.  Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. , 2009, Genome research.

[9]  Matthew E Hurles,et al.  The functional impact of structural variation in humans. , 2008, Trends in genetics : TIG.

[10]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[11]  Zhaoshi Jiang,et al.  Characterization of six human disease-associated inversion polymorphisms , 2009, Human molecular genetics.

[12]  J. Korbel,et al.  Criteria for Inference of Chromothripsis in Cancer Genomes , 2013, Cell.

[13]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[14]  Yadong Wang,et al.  PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants , 2012, Bioinform..

[15]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[16]  E. Mardis Genome sequencing and cancer. , 2012, Current opinion in genetics & development.

[17]  Ali Bashir,et al.  Structural variation analysis with strobe reads , 2010, Bioinform..

[18]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[19]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[20]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[21]  R. Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[22]  Andrew J Sharp,et al.  Structural variation of the human genome. , 2006, Annual review of genomics and human genetics.

[23]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[24]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[25]  Mauricio O. Carneiro,et al.  The advantages of SMRT sequencing , 2013, Genome Biology.

[26]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[27]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[28]  Michael Peter Stromberg Enabling high-throughput sequencing data analysis with MOSAIK , 2010 .

[29]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[30]  S. Salzberg,et al.  TopHat-Fusion: an algorithm for discovery of novel fusion transcripts , 2011, Genome Biology.

[31]  Ira M. Hall,et al.  Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. , 2010, Genome research.

[32]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[33]  Ali Bashir,et al.  Structural variation analysis with strobe reads , 2010 .

[34]  Mark Gerstein,et al.  AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision , 2011, Bioinform..

[35]  K. Choy,et al.  The impact of human copy number variation on a new era of genetic testing , 2010, BJOG : an international journal of obstetrics and gynaecology.

[36]  M. Gerstein,et al.  PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data , 2009, Genome Biology.

[37]  Seunghak Lee,et al.  A robust framework for detecting structural variations in a genome , 2008, ISMB.

[38]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[39]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[40]  Ali Bashir,et al.  A geometric approach for classification and comparison of structural variants , 2009, Bioinform..

[41]  Ira M. Hall,et al.  Characterizing complex structural variation in germline and somatic genomes. , 2012, Trends in genetics : TIG.

[42]  Jay Shendure,et al.  Capturing native long-range contiguity by in situ library construction and optical sequencing , 2012, Proceedings of the National Academy of Sciences.