Whole genome sequencing.

Whole genome sequencing provides the most comprehensive collection of an individual's genetic variation. With the falling costs of sequencing technology, we envision paradigm shift from microarray-based genotyping studies to whole genome sequencing. We review methodologies for whole genome sequencing. There are two approaches for assembling short shotgun sequence reads into longer contiguous genomic sequences. In the de novo assembly approach, sequence reads are compared to each other, and then overlapped to build longer contiguous sequences. The reference-based assembly approach involves mapping each read to a reference genome sequence. We discuss methods for identifying genetic variation (single nucleotide polymorphisms, small indels, and copy number variants) and building haplotypes from genome assemblies, and discuss potential pitfalls. We expect methodologies to evolve rapidly as sequencing technologies improve and more human genomes are sequenced.

[1]  Xavier Estivill,et al.  Copy Number Variants and Common Disorders: Filling the Gaps and Exploring Complexity in Genome-Wide Association Studies , 2007, PLoS genetics.

[2]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[3]  G. Parmigiani,et al.  Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses , 2008, Science.

[4]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[5]  Z. Xuan,et al.  Genome-wide in situ exon capture for selective resequencing , 2007, Nature Genetics.

[6]  David T. Okou,et al.  Microarray-based genomic selection for high-throughput resequencing , 2007, Nature Methods.

[7]  Robert A Holt,et al.  The new paradigm of flow cell sequencing. , 2008, Genome research.

[8]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[9]  C. Tyler-Smith,et al.  Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping , 2008, Nucleic acids research.

[10]  M Nauck,et al.  Haplotypes of the cholesteryl ester transfer protein gene predict lipid-modifying response to statin therapy , 2003, The Pharmacogenomics Journal.

[11]  P. Kwok,et al.  Determination of haplotypes from single DNA molecules: a method for single‐molecule barcoding , 2007, Human mutation.

[12]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[13]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[14]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[15]  A. Halpern,et al.  An MCMC algorithm for haplotype assembly from whole-genome sequence data. , 2008, Genome research.

[16]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[17]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[18]  Eric Boerwinkle,et al.  Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL , 2007, Nature Genetics.

[19]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[20]  Vineet Bafna,et al.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem , 2008, ECCB.

[21]  Jay Shendure,et al.  Multiplex amplification of large sets of human exons , 2007, Nature Methods.

[22]  Juliane C. Dohm,et al.  SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. , 2007, Genome research.

[23]  Jay Shendure,et al.  Long-range polony haplotyping of individual human chromosome molecules , 2006, Nature Genetics.

[24]  G. Parmigiani,et al.  A multidimensional analysis of genes mutated in breast and colorectal cancers. , 2007, Genome research.

[25]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[26]  Jasper Rine,et al.  The prevalence of folate-remedial MTHFR enzyme variants in humans , 2008, Proceedings of the National Academy of Sciences.

[27]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[28]  Jonathan C. Cohen,et al.  Functional characterization of genetic variants in NPC1L1 supports the sequencing extremes strategy to identify complex trait genes , 2008, Human molecular genetics.

[29]  S. Batzoglou,et al.  Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies , 2007, PloS one.

[30]  J. Gilbert,et al.  SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. , 2000, American journal of human genetics.

[31]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[32]  R S Judson,et al.  Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[34]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[35]  John Quackenbush,et al.  Functional classification analysis of somatically mutated genes in human breast and colorectal cancers. , 2008, Genomics.

[36]  G. Parmigiani,et al.  Design and analysis issues in genome-wide somatic mutation studies of cancer. , 2009, Genomics.

[37]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.

[38]  P. Dear,et al.  An efficient method for multi-locus molecular haplotyping , 2006, Nucleic acids research.

[39]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[40]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[41]  Jonathan C. Cohen,et al.  Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[42]  L. Feuk,et al.  Structural variation in the human genome , 2006, Nature Reviews Genetics.

[43]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[44]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[45]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[46]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.