Very low-depth whole-genome sequencing in complex trait association studies

Motivation Very low depth sequencing has been proposed as a cost-effective approach to capture low-frequency and rare variation in complex trait association studies. However, a full characterisation of the genotype quality and association power for very low depth sequencing designs is still lacking. Results We perform cohort-wide whole genome sequencing (WGS) at low depth in 1,239 individuals (990 at 1x depth and 249 at 4x depth) from an isolated population, and establish a robust pipeline for calling and imputing very low depth WGS genotypes from standard bioinformatics tools. Using genotyping chip, whole-exome sequencing (WES, 75x depth) and high-depth (22x) WGS data in the same samples, we examine in detail the sensitivity of this approach, and show that imputed 1x WGS recapitulates 95.2% of variants found by imputed GWAS with an average minor allele concordance of 97% for common and low-frequency variants. In our study, 1x further allowed the discovery of 140,844 true low-frequency variants with 73% genotype concordance when compared to high-depth WGS data. Finally, using association results for 57 quantitative traits, we show that very low depth WGS is an efficient alternative to imputed GWAS chip designs, allowing the discovery of up to twice as many true association signals than the classical imputed GWAS design. Supplementary Data Supplementary Data are appended to this manuscript.

[1]  Inês Barroso,et al.  Cohort-wide deep whole genome sequencing and the allelic architecture of complex traits , 2018, Nature Communications.

[2]  Céline Bellenguez,et al.  Strategies for phasing and imputation in a population isolate , 2018, Genetic epidemiology.

[3]  Andrew Carroll,et al.  Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology , 2017, Nature Genetics.

[4]  P. Donnelly,et al.  Genome-wide genetic data on ~500,000 UK Biobank participants , 2017, bioRxiv.

[5]  Jeremy Schwartzentruber,et al.  Whole genome sequencing and imputation in isolated populations identify genetic associations with medically-relevant complex traits , 2017, Nature Communications.

[6]  Jie Huang,et al.  Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits , 2017, American journal of human genetics.

[7]  William J. Astle,et al.  Allelic Landscape of Human Blood Cell Trait Variation and Links , 2016 .

[8]  Ole Schulz-Trieglaff,et al.  AKT: Ancestry and Kinship Toolkit , 2016, bioRxiv.

[9]  Simon Myers,et al.  Rapid genotype imputation from sequence without reference panels , 2016, Nature Genetics.

[10]  G. McVean,et al.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016, bioRxiv.

[11]  Eleftheria Zeggini,et al.  Very low-depth sequencing in a founder population identifies a cardioprotective APOC3 signal missed by genome-wide imputation , 2016, Human molecular genetics.

[12]  Brian L Browning,et al.  Genotype Imputation with Millions of Reference Samples. , 2016, American journal of human genetics.

[13]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[14]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2015, Nature.

[15]  Alan M. Kwong,et al.  Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers , 2015, Nature Genetics.

[16]  Warren W. Kretzschmar,et al.  Sparse whole genome sequencing identifies two loci for major depressive disorder , 2015, Nature.

[17]  Bjarni V. Halldórsson,et al.  Large-scale whole-genome sequencing of the Icelandic population , 2015, Nature Genetics.

[18]  Oren E. Livne,et al.  PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population , 2015, PLoS Comput. Biol..

[19]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[20]  E. Zeggini,et al.  Using population isolates in genetic association studies , 2014, Briefings in functional genomics.

[21]  E. Zeggini,et al.  Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies , 2014, Genetic epidemiology.

[22]  Jean-François Zagury,et al.  Haplotype estimation using sequencing reads. , 2013, American journal of human genetics.

[23]  Zachariah Gompert,et al.  Population genomics based on low coverage sequencing: how low should we go? , 2013, Molecular ecology.

[24]  Alireza Moayyeri,et al.  The UK Adult Twin Registry (TwinsUK Resource) , 2012, Twin Research and Human Genetics.

[25]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[26]  L. Liang,et al.  Extremely low-coverage sequencing and imputation increases power for genome-wide association studies , 2012, Nature Genetics.

[27]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[28]  Si Quang Le,et al.  SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. , 2011, Genome research.

[29]  Christian Gieger,et al.  Multiple Loci Are Associated with White Blood Cell Phenotypes , 2011, PLoS genetics.

[30]  Yusuke Nakamura,et al.  Identification of Nine Novel Loci Associated with White Blood Cell Subtypes in a Japanese Population , 2011, PLoS genetics.

[31]  Eleazar Eskin,et al.  Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. , 2011, American journal of human genetics.

[32]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[33]  Walter Palmas,et al.  Genetic association analysis highlights new loci that modulate hematological trait variation in Caucasians and African Americans , 2011, Human Genetics.

[34]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[35]  Irina I. Abnizova,et al.  Statistical Comparison of Methods to Estimate the Error Probability in Short-Read Illumina Sequencing , 2010, J. Bioinform. Comput. Biol..

[36]  M. Loh,et al.  Mutations in CBL occur frequently in juvenile myelomonocytic leukemia. , 2009, Blood.

[37]  S. Jiang,et al.  Deltex1 is a target of the transcription factor NFAT that promotes T cell anergy. , 2009, Immunity.

[38]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[39]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[40]  J. Kutok,et al.  Leukemogenic Ptpn11 causes fatal myeloproliferative disorder via cell-autonomous effects on multiple stages of hematopoiesis. , 2009, Blood.

[41]  J. O’Connell,et al.  A Null Mutation in Human APOC3 Confers a Favorable Plasma Lipid Profile and Apparent Cardioprotection , 2008, Science.

[42]  Inês Barroso,et al.  Population-Specific Risk of Type 2 Diabetes Conferred by HNF4A P2 Promoter Variants , 2008, Diabetes.

[43]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[44]  J. Aster,et al.  Deltex1 redirects lymphoid progenitors to the B cell lineage by antagonizing Notch1. , 2002, Immunity.

[45]  The UK 10 K project identifies rare variants in health and disease , 2018 .

[46]  Stacey S. Cherny,et al.  Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets , 2011, Human Genetics.

[47]  Hua Tang,et al.  Confronting ethnicity-specific disease risk , 2006, Nature Genetics.

[48]  M. Pembrey,et al.  ALSPAC--the Avon Longitudinal Study of Parents and Children. I. Study methodology. , 2001, Paediatric and perinatal epidemiology.

[49]  Konrad Reif,et al.  private communication , 1969 .