Whole genome sequencing and its applications in medical genetics

Fundamental improvement was made for genome sequencing since the next-generation sequencing (NGS) came out in the 2000s. The newer technologies make use of the power of massively-parallel short-read DNA sequencing, genome alignment and assembly methods to digitally and rapidly search the genomes on a revolutionary scale, which enable large-scale whole genome sequencing (WGS) accessible and practical for researchers. Nowadays, whole genome sequencing is more and more prevalent in detecting the genetics of diseases, studying causative relations with cancers, making genome-level comparative analysis, reconstruction of human population history, and giving clinical implications and instructions. In this review, we first give a typical pipeline of whole genome sequencing, including the lab template preparation, sequencing, genome assembling and quality control, variants calling and annotations. We compare the difference between whole genome and whole exome sequencing (WES), and explore a wide range of applications of whole genome sequencing for both mendelian diseases and complex diseases in medical genetics. We highlight the impact of whole genome sequencing in cancer studies, regulatory variant analysis, predictive medicine and precision medicine, as well as discuss the challenges of the whole genome sequencing.

[1]  Davis J. McCarthy,et al.  Factors influencing success of clinical genome sequencing across a broad spectrum of disorders , 2015, Nature Genetics.

[2]  D. Schadendorf,et al.  Highly Recurrent TERT Promoter Mutations in Human Melanoma , 2022 .

[3]  A. Dunning,et al.  Beyond GWASs: illuminating the dark road from association to function. , 2013, American journal of human genetics.

[4]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[5]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[6]  D. Bick,et al.  Whole Exome and Whole Genome Sequencing – Community Plan Medical Policy , 2018 .

[7]  Peter Saffrey,et al.  Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units , 2012, Science Translational Medicine.

[8]  Muin J Khoury,et al.  Deploying whole genome sequencing in clinical practice and public health: Meeting the challenge one bin at a time , 2011, Genetics in Medicine.

[9]  C. Sander,et al.  Genome-wide analysis of non-coding regulatory mutations in cancer , 2014, Nature Genetics.

[10]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[11]  Paz Polak,et al.  Genetic Variation in Human DNA Replication Timing , 2014, Cell.

[12]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[13]  L. Kruglyak,et al.  Genetics of global gene expression , 2006, Nature Reviews Genetics.

[14]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[15]  Eric Haugen,et al.  Large-scale identification of sequence variants impacting human transcription factor occupancy in vivo , 2015, Nature Genetics.

[16]  Miguel Melo,et al.  Frequency of TERT promoter mutations in human cancers , 2013, Nature Communications.

[17]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[18]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[19]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[20]  M. Snyder,et al.  Recurrent Somatic Mutations in Regulatory Regions of Human Cancer Genomes , 2015, Nature Genetics.

[21]  Gary D Bader,et al.  Computational approaches to identify functional genetic variants in cancer genomes , 2013, Nature Methods.

[22]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[23]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[24]  Pauline C Ng,et al.  Whole genome sequencing. , 2010, Methods in molecular biology.

[25]  J. Lupski Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. , 1998, Trends in genetics : TIG.

[26]  X. Xie,et al.  Genome-Wide Detection of Single-Nucleotide and Copy-Number Variations of a Single Human Cell , 2012, Science.

[27]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[28]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[29]  L. Cardon,et al.  Association study designs for complex diseases , 2001, Nature Reviews Genetics.

[30]  C. Morton,et al.  Structural genomic variation and personalized medicine. , 2008, The New England journal of medicine.

[31]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[32]  Yudi Pawitan,et al.  Revisiting Mendelian disorders through exome sequencing , 2011, Human Genetics.

[33]  Roderic Guigó,et al.  Identification of genetic variants associated with alternative splicing using sQTLseekeR , 2014, Nature Communications.

[34]  Whole Genome Sequencing in Cancer Clinics , 2014, EBioMedicine.

[35]  Simon Swindell,et al.  Sequence Data Analysis Guidebook , 1996 .

[36]  P. Stankiewicz,et al.  Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. , 2010, The New England journal of medicine.

[37]  A. Fujimoto,et al.  Cancer whole-genome sequencing: present and future , 2015, Oncogene.

[38]  Ting Chen,et al.  Exploring functional variant discovery in non-coding regions with SInBaD , 2012, Nucleic acids research.

[39]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[40]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[41]  Richard Simon,et al.  Genomic biomarkers in predictive medicine. An interim analysis , 2011, EMBO molecular medicine.

[42]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[43]  E. Larsson,et al.  Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types , 2014, Nature Genetics.

[44]  Jie Huang,et al.  Whole-genome sequence-based analysis of thyroid function , 2015, Nature communications.

[45]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[46]  K. Voelkerding,et al.  Next-generation sequencing: from basic research to diagnostics. , 2009, Clinical chemistry.

[47]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[48]  G. Abecasis,et al.  Low-coverage sequencing: implications for design of complex trait association studies. , 2011, Genome research.

[49]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[50]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[51]  J. Wolf,et al.  A field guide to whole-genome sequencing, assembly and annotation , 2014, Evolutionary applications.

[52]  John Crowley,et al.  Developing and Validating Continuous Genomic Signatures in Randomized Clinical Trials for Predictive Medicine , 2012, Clinical Cancer Research.

[53]  Greg Gibson,et al.  Rare and common variants: twenty arguments , 2012, Nature Reviews Genetics.

[54]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[55]  D. Heckerman,et al.  Further Improvements to Linear Mixed Models for Genome-Wide Association Studies , 2014, Scientific Reports.

[56]  P. Zandi,et al.  Whole-genome CNV analysis: advances in computational approaches , 2015, Front. Genet..

[57]  Anu Raghunathan,et al.  Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale , 2006, Nature Genetics.

[58]  Jonathan E. Allen,et al.  Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii , 2002, Nature.

[59]  Manolis Kellis,et al.  Interpreting noncoding genetic variation in complex traits and human disease , 2012, Nature Biotechnology.

[60]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[61]  Andrew D. Johnson,et al.  Whole Genome Sequence-Based Analysis of a Model Complex Trait, High Density Lipoprotein Cholesterol , 2013, Nature Genetics.

[62]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[63]  Lei Shang,et al.  Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants , 2014, Proceedings of the National Academy of Sciences.

[64]  Kimberly R. Kukurba,et al.  Systematic functional regulatory assessment of disease-associated variants , 2013, Proceedings of the National Academy of Sciences.

[65]  Warren W. Kretzschmar,et al.  Sparse whole genome sequencing identifies two loci for major depressive disorder , 2015, Nature.

[66]  W. Ansorge,et al.  Automated Sanger dideoxy sequencing reaction protocol , 1988, FEBS letters.

[67]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[68]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[69]  Rui Jiang,et al.  Integrating Multiple Genomic Data to Predict Disease-Causing Nonsynonymous Single Nucleotide Variants in Exome Sequencing Studies , 2014, PLoS genetics.

[70]  J. Lupski,et al.  Genomic disorders ten years on , 2009, Genome Medicine.

[71]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[72]  Serafim Batzoglou,et al.  Toward an Individual Approach to Methadone Therapy of Heroin Addicts , 2006, PLoS Medicine.

[73]  K. Veeramah,et al.  The impact of whole-genome sequencing on the reconstruction of human population history , 2014, Nature Reviews Genetics.

[74]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[75]  Bin Yan,et al.  Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression , 2015, Briefings Bioinform..

[76]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[77]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[78]  Masao Nagasaki,et al.  Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing , 2010, Nature Genetics.

[79]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[80]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[81]  R. Young,et al.  An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element , 2014, Science.

[82]  Konrad H. Paszkiewicz,et al.  De novo assembly of short sequence reads , 2010, Briefings Bioinform..

[83]  D. Geschwind,et al.  Gene hunting in autism spectrum disorder: on the path to precision medicine , 2015, The Lancet Neurology.

[84]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[85]  You-Qiang Song,et al.  Evaluation of next-generation sequencing software in mapping and assembly , 2011, Journal of Human Genetics.

[86]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[87]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[88]  J. Barrett,et al.  Strategies for fine-mapping complex traits , 2015, Human molecular genetics.

[89]  J. D. Watson The human genome project: past, present, and future. , 1990, Science.

[90]  A. Sivachenko,et al.  Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer , 2012, Nature Genetics.

[91]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[92]  Ryan M. Layer,et al.  SpeedSeq: Ultra-fast personal genome analysis and interpretation , 2014, Nature Methods.

[93]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[94]  Joel Gelernter,et al.  Variant Callers for Next-Generation Sequencing Data: A Comparison Study , 2013, PloS one.

[95]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[96]  Lin Liu,et al.  Comparison of Next-Generation Sequencing Systems , 2012, Journal of biomedicine & biotechnology.

[97]  Zhongming Zhao,et al.  CNVannotator: A Comprehensive Annotation Server for Copy Number Variation in the Human Genome , 2013, PloS one.

[98]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[99]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[100]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[101]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[102]  M. Berriman,et al.  REAPR: a universal tool for genome assembly evaluation , 2013, Genome Biology.

[103]  Jake K. Byrnes,et al.  Bayesian refinement of association signals for 14 loci in 3 common diseases , 2012, Nature Genetics.

[104]  L. Biesecker Hypothesis-generating research and predictive medicine , 2013, Genome research.

[105]  J. Lupski,et al.  Non-coding genetic variants in human disease. , 2015, Human molecular genetics.

[106]  Erik L. Hewlett,et al.  Whole-Genome Sequencing in Outbreak Analysis , 2015, Clinical Microbiology Reviews.

[107]  C. Thermes,et al.  Library preparation methods for next-generation sequencing: tone down the bias. , 2014, Experimental cell research.

[108]  Euan A Ashley,et al.  Clinical interpretation and implications of whole-genome sequencing. , 2014, JAMA.

[109]  Masahiro Kasahara,et al.  Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes , 2014, BMC Genomics.

[110]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[111]  Christopher J. Nelson,et al.  Advantages of next-generation sequencing versus the microarray in epigenetic research. , 2009, Briefings in functional genomics & proteomics.

[112]  G. Sonpavde,et al.  Precision and predictive medicine in urothelial cancer: are we making progress? , 2015, European urology.

[113]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[114]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[115]  Erez Lieberman Aiden,et al.  The expanding scope of DNA sequencing , 2012, Nature Biotechnology.

[116]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[117]  J. Lupski Structural variation mutagenesis of the human genome: Impact on disease and evolution , 2015, Environmental and molecular mutagenesis.

[118]  Daniel M Bader,et al.  A beginners guide to SNP calling from high-throughput DNA-sequencing data , 2012, Human Genetics.

[119]  J. Lupski,et al.  Human genome sequencing in health and disease. , 2012, Annual review of medicine.

[120]  Vladimir Makarov,et al.  AnnTools: a comprehensive and versatile annotation toolkit for genomic variants , 2012, Bioinform..

[121]  H. C. Mak Genome interpretation and assembly—recent progress and next steps , 2012, Nature Biotechnology.

[122]  S. Prabhakar,et al.  Sensitive detection of chromatin-altering polymorphisms reveals autoimmune disease mechanisms , 2015, Nature Methods.

[123]  A. Toland,et al.  Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic , 2014, EBioMedicine.

[124]  Zhengyan Kan,et al.  Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer , 2011, Nature Genetics.

[125]  Graham F Hatfull,et al.  Bacteriophage genomics. , 2008, Current opinion in microbiology.

[126]  Mark A. Rubin,et al.  Health: Make precision medicine work for cancer care , 2015, Nature.

[127]  Euan A Ashley,et al.  Performance comparison of whole-genome sequencing platforms , 2011, Nature Biotechnology.

[128]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..