AbCD: arbitrary coverage design for sequencing-based genetic studies

Recent advances in sequencing technologies have revolutionized genetic studies. Although high-coverage sequencing can uncover most variants present in the sequenced sample, low-coverage sequencing is appealing for its cost effectiveness. Here, we present AbCD (arbitrary coverage design) to aid the design of sequencing-based studies. AbCD is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbitrary coverage (0.5-30×) and sample size (20-10 000), and for four major ethnic groups (Europeans, Africans, Asians and African Americans). In addition, we also present two software tools: ShotGun and DesignPlanner, which were used to generate the estimates behind AbCD. ShotGun is a flexible short-read simulator for arbitrary user-specified read length and average depth, allowing cycle-specific sequencing error rates and realistic read depth distributions. DesignPlanner is a full pipeline that uses ShotGun to generate sequence data and performs initial SNP discovery, uses our previously presented linkage disequilibrium-aware method to call genotypes, and, finally, provides minor allele frequency-specific effective sample sizes. ShotGun plus DesignPlanner can accommodate effective sample size estimate for any combination of high-depth and low-depth data (for example, whole-genome low-depth plus exonic high-depth) or combination of sequence and genotype data [for example, whole-exome sequencing plus genotyping from existing Genomewide Association Study (GWAS)].

[1]  Taylor J. Maxwell,et al.  Deep resequencing reveals excess rare recent variants consistent with explosive population growth , 2010, Nature communications.

[2]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[3]  Joshua M. Korn,et al.  Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation , 2012, PLoS Comput. Biol..

[4]  L. Liang,et al.  Extremely low-coverage sequencing and imputation increases power for genome-wide association studies , 2012, Nature Genetics.

[5]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[6]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[7]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[8]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[9]  Iuliana Ionita-Laza,et al.  Finding disease variants in Mendelian disorders by using sequence data: methods and applications. , 2011, American journal of human genetics.

[10]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[11]  Nilanjan Chatterjee,et al.  Efficient study design for next generation sequencing , 2011, Genetic epidemiology.

[12]  G. Abecasis,et al.  Low-coverage sequencing: implications for design of complex trait association studies. , 2011, Genome research.

[13]  Olivier Harismendy,et al.  Accurate detection and genotyping of SNPs utilizing population sequencing data. , 2010, Genome research.

[14]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[15]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[16]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.