Exome sequencing and complex disease: practical aspects of rare variant association studies

Genetic association and linkage studies can provide insights into complex disease biology, guiding the development of new diagnostic and therapeutic strategies. Over the past decade, genetic association studies have largely focused on common, easy to measure genetic variants shared between many individuals. These common variants typically have subtle functional consequence and translating the resulting association signals into biological insights can be challenging. In the last few years, exome sequencing has emerged as a cost-effective strategy for extending these studies to include rare coding variants, which often have more marked functional consequences. Here, we provide practical guidance in the design and analysis of complex trait association studies focused on rare, coding variants.

[1]  A. Chapelle,et al.  Disease gene mapping in isolated human populations: the example of Finland. , 1993, Journal of medical genetics.

[2]  Peter Beighton,et al.  de la Chapelle, A. , 1997 .

[3]  S. O’Rahilly,et al.  A frameshift mutation in MC4R associated with dominantly inherited human obesity , 1998, Nature Genetics.

[4]  F. Wright,et al.  Linkage disequilibrium mapping in isolated populations: the example of Finland revisited. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  K. Clément,et al.  A frameshift mutation in human MC4R is associated with a dominant form of obesity , 1998, Nature Genetics.

[6]  P. Bork,et al.  Prediction of nonsynonymous single nucleotide polymorphisms in human disease-associated genes , 1999, Journal of Molecular Medicine.

[7]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[8]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[9]  S. P. Fodor,et al.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays , 2004, Nature Methods.

[10]  Monika Milewski,et al.  Decoding randomly ordered DNA arrays. , 2004, Genome research.

[11]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[12]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[13]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[14]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[15]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[16]  A. Edwards,et al.  Complement Factor H Polymorphism and Age-Related Macular Degeneration , 2005, Science.

[17]  James T. Elder Fine mapping of the psoriasis susceptibility gene PSORS1: a reassessment of risk associated with a putative risk haplotype lacking HLA-Cw6. , 2005, The Journal of investigative dermatology.

[18]  J. Gilbert,et al.  Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration , 2005, Science.

[19]  D. Levy,et al.  Single-Gene Mutations and Increased Left Ventricular Wall Thickness in the Community: The Framingham Heart Study , 2006, Circulation.

[20]  Jonathan C. Cohen,et al.  A spectrum of PCSK9 alleles contributes to plasma levels of low-density lipoprotein cholesterol. , 2006, American journal of human genetics.

[21]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[22]  Jonathan C. Cohen,et al.  Molecular characterization of loss-of-function mutations in PCSK9 and identification of a compound heterozygote. , 2006, American journal of human genetics.

[23]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[24]  Michael Boehnke,et al.  Efficient study designs for test of genetic association using sibship data and unrelated cases and controls. , 2006, American journal of human genetics.

[25]  Shamil R Sunyaev,et al.  Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. , 2007, American journal of human genetics.

[26]  Judy H Cho,et al.  Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis , 2007, Nature Genetics.

[27]  G. Abecasis,et al.  Optimal designs for two‐stage genome‐wide association studies , 2007, Genetic epidemiology.

[28]  Alastair Forbes,et al.  Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility , 2007, Nature Genetics.

[29]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[30]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[31]  M. Spitz,et al.  Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. , 2008, American journal of human genetics.

[32]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[33]  Hongyu Zhao,et al.  Rare independent mutations in renal salt handling genes contribute to blood pressure variation , 2008, Nature Genetics.

[34]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[35]  J. O’Connell,et al.  A Null Mutation in Human APOC3 Confers a Favorable Plasma Lipid Profile and Apparent Cardioprotection , 2008, Science.

[36]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[37]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[38]  Zhaoxia Yu,et al.  Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. , 2009, American journal of human genetics.

[39]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[40]  John P. A. Ioannidis,et al.  Validating, augmenting and refining genome-wide association signals , 2009, Nature Reviews Genetics.

[41]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[42]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[43]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[44]  T. LaFramboise,et al.  Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances , 2009, Nucleic acids research.

[45]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[46]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[47]  J. Stamatoyannopoulos,et al.  Power of deep, all-exon resequencing for discovery of human trait genes , 2009, Proceedings of the National Academy of Sciences.

[48]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[49]  Yun Li,et al.  Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. , 2010, American journal of human genetics.

[50]  M. Rivas,et al.  Nature Genetics Advance Online Publication High-throughput, Pooled Sequencing Identifies Mutations in Nubpl and Foxred1 in Human Complex I Deficiency , 2022 .

[51]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[52]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[53]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[54]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[55]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[56]  Jonathan C. Cohen,et al.  Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia. , 2010, The New England journal of medicine.

[57]  Tariq Ahmad,et al.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci , 2010, Nature Genetics.

[58]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[59]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[60]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[61]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[62]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[63]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[64]  R. Mägi,et al.  Assessing the impact of missing genotype data in rare variant association analysis , 2011, BMC Proceedings.

[65]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[66]  Si Quang Le,et al.  SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. , 2011, Genome research.

[67]  D. Altshuler,et al.  Power in the phenotypic extremes: a simulation study of power in discovery and replication of rare variants , 2011, Genetic epidemiology.

[68]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[69]  D. Conti,et al.  Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies , 2011, Genetic epidemiology.

[70]  Kristian Cibulskis,et al.  ContEst: estimating cross-contamination of human samples in next-generation sequencing data , 2011, Bioinform..

[71]  M. G. Reese,et al.  A probabilistic disease-gene finder for personal genomes. , 2011, Genome research.

[72]  G. Abecasis,et al.  Low-coverage sequencing: implications for design of complex trait association studies. , 2011, Genome research.

[73]  Aleksandar Milosavljevic,et al.  An integrative variant analysis suite for whole exome next-generation sequencing data , 2012, BMC Bioinformatics.

[74]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[75]  B. Stranger,et al.  Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics , 2011, Genetics.

[76]  Dan M Roden,et al.  A rare variant in MYH6 is associated with high risk of sick sinus syndrome , 2011, Nature Genetics.

[77]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[78]  N. Mehta Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. , 2011, Circulation. Cardiovascular genetics.

[79]  Sivakumar Gowrisankar,et al.  A rare penetrant mutation in CFH confers high risk of age-related macular degeneration , 2011, Nature Genetics.

[80]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[81]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[82]  Robert M. Plenge,et al.  Meta-Analysis of Genome-Wide Association Studies in Celiac Disease and Rheumatoid Arthritis Identifies Fourteen Non-HLA Shared Loci , 2011, PLoS genetics.

[83]  Joshua M. Korn,et al.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease , 2011, Nature Genetics.

[84]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[85]  Anders D. Børglum,et al.  Genome-wide association study identifies five new schizophrenia loci , 2011, Nature Genetics.

[86]  Michael F. Walker,et al.  De novo mutations revealed by whole-exome sequencing are strongly associated with autism , 2012, Nature.

[87]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[88]  D. Srivastava,et al.  Genetics of Human Cardiovascular Disease , 2012, Cell.

[89]  L. Liang,et al.  Extremely low-coverage sequencing and imputation increases power for genome-wide association studies , 2012, Nature Genetics.

[90]  Ole A. Andreassen,et al.  A mutation in APP protects against Alzheimer’s disease and age-related cognitive decline , 2012, Nature.

[91]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[92]  A. Clark,et al.  Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants , 2012, Science.

[93]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[94]  Tanya M. Teslovich,et al.  The Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric Traits , 2012, PLoS genetics.

[95]  Adam Kiezun,et al.  Exome sequencing and the genetic basis of complex traits , 2012, Nature Genetics.

[96]  Biao Li,et al.  SimRare: a program to generate and analyze sequence-based data for association studies of quantitative and qualitative traits , 2012, Bioinform..

[97]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[98]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[99]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[100]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[101]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[102]  S. Yusuf,et al.  Excess of Rare Variants in Non–Genome-Wide Association Study Candidate Genes in Patients With Hypertriglyceridemia , 2012, Circulation. Cardiovascular genetics.

[103]  G. Abecasis,et al.  Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. , 2012, American journal of human genetics.

[104]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[105]  Mark Gerstein,et al.  VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment , 2012, Bioinform..

[106]  Joshua M. Korn,et al.  Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation , 2012, PLoS Comput. Biol..

[107]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[108]  S. Begum,et al.  Sequence Alignment , 2018, Beginners Guide to Bioinformatics for High Throughput Sequencing.