A Two-Dimensional Pooling Strategy for Rare Variant Detection on Next-Generation Sequencing Platforms

We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher.

[1]  Christopher I Amos,et al.  Common 5p15.33 and 6p21.33 variants influence lung cancer risk , 2008, Nature Genetics.

[2]  R. Houlston,et al.  The TERT-CLPTM1L lung cancer susceptibility variant associates with higher DNA adduct formation in the lung. , 2009, Carcinogenesis.

[3]  R. Gibbs,et al.  A clone-array pooled shotgun strategy for sequencing large genomes. , 2001, Genome research.

[4]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[5]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[6]  G. Hannon,et al.  DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis. , 2009, Genome research.

[7]  Vladimir Filkov,et al.  Statistical Mutation Calling from Sequenced Overlapping DNA Pools in TILLING Experiments , 2011, BMC Bioinformatics.

[8]  A. Janssen,et al.  High-Throughput Detection of Induced Mutations and Natural Variation Using KeyPoint™ Technology , 2009, PloS one.

[9]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..

[10]  A. C. Chinault,et al.  Rapid identification of yeast artificial chromosome clones by matrix pooling and crude lysate PCR. , 1990, Nucleic acids research.

[11]  Vikas Bansal,et al.  Efficient and Cost Effective Population Resequencing by Pooling and In-Solution Hybridization , 2011, PloS one.

[12]  Hongbing Shen,et al.  Common genetic variants on 5p15.33 contribute to risk of lung adenocarcinoma in a Chinese population. , 2009, Carcinogenesis.

[13]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[14]  Simon Heath,et al.  Lung cancer susceptibility locus at 5p15.33 , 2008, Nature Genetics.

[15]  Ying Wang,et al.  A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. , 2009, American journal of human genetics.

[16]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..