High-throughput discovery of rare insertions and deletions in large cohorts.

Pooled-DNA sequencing strategies enable fast, accurate, and cost-effect detection of rare variants, but current approaches are not able to accurately identify short insertions and deletions (indels), despite their pivotal role in genetic disease. Furthermore, the sensitivity and specificity of these methods depend on arbitrary, user-selected significance thresholds, whose optimal values change from experiment to experiment. Here, we present a combined experimental and computational strategy that combines a synthetically engineered DNA library inserted in each run and a new computational approach named SPLINTER that detects and quantifies short indels and substitutions in large pools. SPLINTER integrates information from the synthetic library to select the optimal significance thresholds for every experiment. We show that SPLINTER detects indels (up to 4 bp) and substitutions in large pools with high sensitivity and specificity, accurately quantifies variant frequency (r = 0.999), and compares favorably with existing algorithms for the analysis of pooled sequencing data. We applied our approach to analyze a cohort of 1152 individuals, identifying 48 variants and validating 14 of 14 (100%) predictions by individual genotyping. Thus, our strategy provides a novel and sensitive method that will speed the discovery of novel disease-causing rare variants.

[1]  D. Cooper,et al.  Human Gene Mutation Database , 1996, Human Genetics.

[2]  Michael Krawczak,et al.  The human gene mutation database , 1998, Nucleic Acids Res..

[3]  E. Lander,et al.  On the allelic spectrum of human disease. , 2001, Trends in genetics : TIG.

[4]  M. King,et al.  Breast and Ovarian Cancer Risks Due to Inherited Mutations in BRCA1 and BRCA2 , 2003, Science.

[5]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[6]  Bruce Winney,et al.  Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  W. Bodmer,et al.  Rare Variant Hypothesis for Multifactorial Inheritance: Susceptibility to Colorectal Adenomas as a Model , 2005, Cell cycle.

[8]  Carlos Caldas,et al.  Molecular heterogeneity of breast carcinomas and the cancer stem cell hypothesis , 2007, Nature Reviews Cancer.

[9]  Roded Sharan,et al.  Medical sequencing at the extremes of human body mass. , 2006, American journal of human genetics.

[10]  Stuart Macgregor,et al.  Most pooling variation in array-based DNA pooling is attributable to array error rather than pool construction error , 2007, European Journal of Human Genetics.

[11]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[12]  Robert D Schnabel,et al.  SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries , 2008, Nature Methods.

[13]  Hongyu Zhao,et al.  Rare independent mutations in renal salt handling genes contribute to blood pressure variation , 2008, Nature Genetics.

[14]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[15]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[16]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[17]  H. J. Beaumont,et al.  Experimental evolution of bet hedging , 2009, Nature.

[18]  G. Hannon,et al.  DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis. , 2009, Genome research.

[19]  Justin C. Fay,et al.  Quantification of rare allelic variants from pooled genomic DNA , 2009, Nature Methods.

[20]  Jeffrey E. Barrick,et al.  Genome evolution and adaptation in a long-term experiment with Escherichia coli , 2009, Nature.

[21]  D. Goldstein Common genetic variation and human traits. , 2009, The New England journal of medicine.

[22]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[23]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[24]  Snehit Prabhu,et al.  Overlapping Pools for High Throughput Targeted Resequencing , 2009, RECOMB.

[25]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..

[26]  Anne E Carpenter,et al.  Visualization of image data from cells to organisms , 2010, Nature Methods.

[27]  Vikas Bansal,et al.  A statistical method for the detection of variants from next-generation resequencing of DNA pools , 2010, Bioinform..

[28]  Emily H Turner,et al.  Target-enrichment strategies for next-generation sequencing , 2010, Nature Methods.