Sequencing studies in human genetics: design and interpretation

Next-generation sequencing is becoming the primary discovery tool in human genetics. There have been many clear successes in identifying genes that are responsible for Mendelian diseases, and sequencing approaches are now poised to identify the mutations that cause undiagnosed childhood genetic diseases and those that predispose individuals to more common complex diseases. There are, however, growing concerns that the complexity and magnitude of complete sequence data could lead to an explosion of weakly justified claims of association between genetic variants and disease. Here, we provide an overview of the basic workflow in next-generation sequencing studies and emphasize, where possible, measures and considerations that facilitate accurate inferences from human sequencing studies.

[1]  Accomplishments and challenges , 1964, Diseases of the colon and rectum.

[2]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[3]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[4]  S. Cessie,et al.  Dispersion of ventricular repolarization and arrhythmic cardiac death in coronary artery disease. , 1994, The American journal of cardiology.

[5]  Gabor T. Marth,et al.  A general approach to single-nucleotide polymorphism discovery , 1999, Nature Genetics.

[6]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[7]  A. Camm,et al.  Relationships between preclinical cardiac electrophysiology, clinical QT interval prolongation and torsade de pointes for a broad range of drugs: evidence for a provisional safety margin in drug development. , 2003, Cardiovascular research.

[8]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[9]  P. Stenson,et al.  Human Gene Mutation Database (HGMD , 2003 .

[10]  A. Sidow,et al.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. , 2005, Genome research.

[11]  Paul J. Harrison,et al.  Schizophrenia genes, gene expression, and neuropathology: on the matter of their convergence , 2005, Molecular Psychiatry.

[12]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[13]  Daniel R Weinberger,et al.  Psychiatric genetics--the new era: genetic research and some clinical implications. , 2005, British medical bulletin.

[14]  Mark J Daly,et al.  Analysis of high-resolution HapMap of DTNBP1 (Dysbindin) suggests no consistency between reported common variant associations and schizophrenia. , 2006, American journal of human genetics.

[15]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[16]  Frances M. Ashcroft,et al.  From molecule to malady , 2006, Nature.

[17]  Barbara Di Ventura,et al.  From in vivo to in silico biology and back , 2006, Nature.

[18]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.

[19]  William Stafford Noble,et al.  Widely distributed noncoding purifying selection in the human genome , 2007, Proceedings of the National Academy of Sciences.

[20]  Joanna Owens,et al.  Target validation: Determining druggability , 2007, Nature Reviews Drug Discovery.

[21]  Ronald Wilders,et al.  Cardiac channelopathies studied with the dynamic action potential-clamp technique. , 2007, Methods in molecular biology.

[22]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[23]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[24]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[25]  C. Hoggart,et al.  Genome‐wide significance for dense SNP and resequencing data , 2008, Genetic epidemiology.

[26]  V. Willour,et al.  Long tandem repeats as a form of genomic copy number variation: structure and length polymorphism of a chromosome 5p repeat in control and schizophrenia populations , 2009, Psychiatric genetics.

[27]  Paul Flicek,et al.  Sense from sequence reads: methods for alignment and assembly , 2009, Nature Methods.

[28]  David B. Goldstein,et al.  A Genome-Wide Investigation of SNPs and CNVs in Schizophrenia , 2009, PLoS genetics.

[29]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[30]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[31]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[32]  L. Robison,et al.  Diabetes mellitus in long-term survivors of childhood cancer. Increased risk associated with radiation therapy: a report for the childhood cancer survivor study. , 2009, Archives of internal medicine.

[33]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[34]  Michael C Wendl,et al.  The theory of discovering rare variants via DNA sequencing , 2009, BMC Genomics.

[35]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[36]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[37]  E. Mardis,et al.  Analysis of next-generation genomic data in cancer: accomplishments and challenges. , 2010, Human molecular genetics.

[38]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[39]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[40]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[41]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[42]  S. Sunyaev,et al.  Human allelic variation: perspective from protein function, structure, and evolution. , 2010, Current opinion in structural biology.

[43]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[44]  V. Bansal,et al.  Statistical analysis strategies for association studies involving rare variants , 2010, Nature Reviews Genetics.

[45]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[46]  S. Gabriel,et al.  Advances in understanding cancer genomes through second-generation sequencing , 2010, Nature Reviews Genetics.

[47]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[48]  Heng Li,et al.  Improving SNP discovery by base alignment quality , 2011, Bioinform..

[49]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[50]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[51]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[52]  Stephen C. J. Parker,et al.  Accurate and comprehensive sequencing of personal genomes. , 2011, Genome research.

[53]  Qianqian Zhu,et al.  A genome-wide comparison of the functional properties of rare and common genetic variants in humans. , 2011, American journal of human genetics.

[54]  M. DePristo,et al.  Variation in genome-wide mutation rates within and between human families , 2011, Nature Genetics.

[55]  J. C. Belmonte,et al.  Diseases in a dish: modeling human genetic disorders using induced pluripotent cells , 2011, Nature Medicine.

[56]  Ryan W. Kim,et al.  Carrier Testing for Severe Childhood Recessive Diseases by Next-Generation Sequencing , 2011, Science Translational Medicine.

[57]  Zheng-Zheng Tang,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011 .

[58]  Marek Kimmel,et al.  Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed , 2011, Human mutation.

[59]  Elizabeth T. Cirulli,et al.  SVA: software for annotating and visualizing sequenced human genomes , 2011, Bioinform..

[60]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[61]  H. Kitano,et al.  Software for systems biology: from tools to integrated platforms , 2011, Nature Reviews Genetics.

[62]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[63]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[64]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[65]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[66]  Adam Kiezun,et al.  Computational and statistical approaches to analyzing variants identified by exome sequencing , 2011, Genome Biology.

[67]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[68]  B. V. van Bon,et al.  Diagnostic exome sequencing in persons with severe intellectual disability. , 2012, The New England journal of medicine.

[69]  Kenny Q. Ye,et al.  De Novo Gene Disruptions in Children on the Autistic Spectrum , 2012, Neuron.

[70]  Michael F. Walker,et al.  De novo mutations revealed by whole-exome sequencing are strongly associated with autism , 2012, Nature.

[71]  K. Shianna,et al.  Using ERDS to infer copy-number variants in high-coverage genomes. , 2012, American journal of human genetics.

[72]  D. Horn,et al.  Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study , 2012, The Lancet.

[73]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[74]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[75]  A. deMello,et al.  High-throughput analysis of protein-protein interactions in picoliter-volume droplets using fluorescence polarization. , 2012, Analytical chemistry.

[76]  Adam Kiezun,et al.  Exome sequencing and the genetic basis of complex traits , 2012, Nature Genetics.

[77]  David B. Goldstein,et al.  De novo mutations in ATP1A3 cause alternating hemiplegia of childhood , 2012, Nature Genetics.

[78]  T. Tokuyasu,et al.  Epistatic interactions between Tgfb1 and genetic loci, Tgfbm2 and Tgfbm3, determine susceptibility to an asthmatic stimulus , 2012, Proceedings of the National Academy of Sciences.

[79]  K. Shianna,et al.  Exome sequencing followed by large-scale genotyping fails to identify single rare variants of large effect in idiopathic generalized epilepsy. , 2012, American journal of human genetics.

[80]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[81]  Peter Saffrey,et al.  Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units , 2012, Science Translational Medicine.

[82]  Jari Tiihonen,et al.  Exome sequencing followed by large-scale genotyping suggests a limited role for moderately rare risk factors of strong effect in schizophrenia. , 2012, American journal of human genetics.

[83]  Wei Chen,et al.  A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families , 2012, PLoS genetics.

[84]  Kelly Schoch,et al.  Clinical application of exome sequencing in undiagnosed genetic conditions , 2012, Journal of Medical Genetics.

[85]  Yurii S. Aulchenko,et al.  The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals , 2012, PLoS genetics.

[86]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[87]  Jørgen K. Kanters,et al.  In silico cardiac risk assessment in patients with long QT syndrome: type 1: clinical predictability of cardiac models. , 2012, Journal of the American College of Cardiology.

[88]  Wei Chen,et al.  Genotype calling and haplotyping in parent-offspring trios , 2013, Genome research.

[89]  C. Reid,et al.  Multiple molecular mechanisms for a single GABAA mutation in epilepsy , 2013, Neurology.

[90]  Ben Lehner Genotype to phenotype: lessons from model organisms for human genetics , 2013, Nature Reviews Genetics.