PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants

MOTIVATION The development of high-throughput sequencing technologies has enabled novel methods for detecting structural variants (SVs). Current methods are typically based on depth of coverage or pair-end mapping clusters. However, most of these only report an approximate location for each SV, rather than exact breakpoints. RESULTS We have developed pair-read informed split mapping (PRISM), a method that identifies SVs and their precise breakpoints from whole-genome resequencing data. PRISM uses a split-alignment approach informed by the mapping of paired-end reads, hence enabling breakpoint identification of multiple SV types, including arbitrary-sized inversions, deletions and tandem duplications. Comparisons to previous datasets and simulation experiments illustrate PRISM's high sensitivity, while PCR validations of PRISM results, including previously uncharacterized variants, indicate an overall precision of ~90%. AVAILABILITY PRISM is freely available at http://compbio.cs.toronto.edu/prism.

[1]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[2]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[3]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[4]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[5]  S. Mccarroll,et al.  Copy-number variation and association studies of human disease , 2007, Nature Genetics.

[6]  Jin Zhang,et al.  An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data , 2012, BMC Bioinformatics.

[7]  Michael Brudno,et al.  Savant: genome browser for high-throughput sequencing data , 2010, Bioinform..

[8]  M. Rieder,et al.  Detection of structural variants and indels within exome data , 2011, Nature Methods.

[9]  Martin Vingron,et al.  Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS , 2012, Bioinform..

[10]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[11]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[12]  Dario Strbenac,et al.  Savant Genome Browser 2: visualization and analysis for population-scale genomics , 2012, Nucleic Acids Res..

[13]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[14]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[15]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[16]  Tom Walsh,et al.  Accurate and exact CNV identification from targeted high-throughput sequence data , 2011, BMC Genomics.

[17]  Misko Dzamba,et al.  Detecting copy number variation with mated short reads. , 2010, Genome research.

[18]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[21]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[22]  Jin Zhang,et al.  SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data , 2011, Bioinform..

[23]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.