Genetic Variation in an Individual Human Exome

There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's ‘exome,’ which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the ∼12,500 variants that affect the protein coding portion of an individual's genome. We identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which ∼15–20% are rare in the human population. We predict ∼1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the ∼700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of ∼12,500 nonsilent coding variants by ∼8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation.

[1]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[2]  G. A. Watterson,et al.  Is the most frequent allele the oldest? , 1977, Theoretical population biology.

[3]  M. Lynch,et al.  On your mark. , 1982, The Lamp.

[4]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[5]  S. Tsuji,et al.  A novel exon mutation in the human beta-hexosaminidase beta subunit gene affects 3' splice site selection. , 1992, The Journal of biological chemistry.

[6]  J. Leunissen,et al.  Molecular characterization of the human peroxisomal branched-chain acyl-CoA oxidase: cDNA cloning, chromosomal assignment, tissue distribution, and evidence for the absence of the protein in Zellweger syndrome. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[7]  E. Lander The New Genomics: Global Views of Biology , 1996, Science.

[8]  M. Blitzer,et al.  Profound biotinidase deficiency in two asymptomatic adults. , 1997, American journal of medical genetics.

[9]  T. D. Schneider,et al.  Information content of individual genetic sequences. , 1997, Journal of theoretical biology.

[10]  J. M. Aerts,et al.  The Human Chitotriosidase Gene , 1998, The Journal of Biological Chemistry.

[11]  B. Wolf,et al.  Mutations Causing Profound Biotinidase Deficiency in Children Ascertained by Newborn Screening in the United States Occur at Different Frequencies than in Symptomatic Children , 1999, Pediatric Research.

[12]  M. Cargill Characterization of single-nucleotide polymorphisms in coding regions of human genes , 1999, Nature Genetics.

[13]  N. Shen,et al.  Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis , 1999, Nature Genetics.

[14]  N. Wald,et al.  When can a risk factor be used as a worthwhile screening test? , 1999, BMJ.

[15]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[16]  A. Chakravarti Population genetics—making sense out of sequence , 1999, Nature Genetics.

[17]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[18]  E W Jabs,et al.  Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders , 2000, Clinical genetics.

[19]  L. Kruglyak,et al.  An analysis of strategies for discovery of single‐nucleotide polymorphisms , 2000, Genetic epidemiology.

[20]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[21]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[22]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[23]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[24]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[25]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[26]  J. Stephens,et al.  Haplotype Variation and Linkage Disequilibrium in 313 Human Genes , 2001, Science.

[27]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[28]  G. Abecasis,et al.  Gene polymorphism in Netherton and common atopic disease , 2001, Nature Genetics.

[29]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[30]  B. Trask,et al.  The sense of smell: genomics of vertebrate odorant receptors. , 2002, Human molecular genetics.

[31]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[32]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[33]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[34]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[35]  P. Jeggo,et al.  Genetic variants of NHEJ DNA ligase IV can affect the risk of developing multiple myeloma, a tumour characterised by aberrant class switch recombination , 2002, Journal of medical genetics.

[36]  J. Lupski,et al.  Molecular mechanisms for genomic disorders. , 2003, Annual review of genomics and human genetics.

[37]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[38]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[39]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[40]  Richard Judson,et al.  Genome-wide evaluation of the public SNP databases. , 2003, Pharmacogenomics.

[41]  M. Egan,et al.  The BDNF val66met Polymorphism Affects Activity-Dependent Secretion of BDNF and Human Memory and Hippocampal Function , 2003, Cell.

[42]  S. Gabriel,et al.  Quality and completeness of SNP databases , 2003, Nature Genetics.

[43]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[44]  Conrad C. Huang,et al.  Natural variation in human membrane transporter genes reveals evolutionary and functional constraints , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[45]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[46]  Lars Bolund,et al.  A population threshold for functional polymorphisms. , 2003, Genome research.

[47]  Deborah A. Nickerson,et al.  Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans , 2003, Nature Genetics.

[48]  P. Stenson,et al.  Human Gene Mutation Database (HGMD , 2003 .

[49]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[50]  Sivakumar Gowrisankar,et al.  Pattern of sequence variation across 213 environmental response genes. , 2004, Genome research.

[51]  Association of BDNF with anorexia, bulimia and age of onset of weight loss in six European populations. , 2004, Human molecular genetics.

[52]  D. Conte,et al.  The mitochondrial superoxide dismutase A16V polymorphism in the cardiomyopathy associated with hereditary haemochromatosis , 2004, Journal of Medical Genetics.

[53]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[54]  S. Batzoglou,et al.  Characterization of evolutionary rates and constraints in three Mammalian genomes. , 2004, Genome research.

[55]  Andreas Wagner,et al.  Duplicate genes and robustness to transient gene knock-downs in Caenorhabditis elegans , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[56]  Albert Y Lau,et al.  Functional classification of proteins and protein variants. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[57]  [Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes]. , 2004, Yi chuan xue bao = Acta genetica Sinica.

[58]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[59]  H. Shill,et al.  BDNF genetic variants are associated with onset age of familial Parkinson disease: GenePD Study , 2005, Neurology.

[60]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[61]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[62]  D. Hunter Gene–environment interactions in human diseases , 2005, Nature Reviews Genetics.

[63]  D. Postma,et al.  Polymorphisms in SPINK5 are not associated with asthma in a Dutch population. , 2005, The Journal of allergy and clinical immunology.

[64]  G. Church,et al.  The Personal Genome Project , 2005, Molecular systems biology.

[65]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[66]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[67]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[68]  Deborah A Nickerson,et al.  Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes. , 2005, Human molecular genetics.

[69]  Modesto Orozco,et al.  PMUT: a web-based tool for the annotation of pathological mutations on proteins , 2005, Bioinform..

[70]  Elizabeth Pennisi Genomics. On your mark. Get set. Sequence! , 2006, Science.

[71]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[72]  Laurent Excoffier,et al.  Conserved noncoding sequences are selectively constrained and not mutation cold spots , 2006, Nature Genetics.

[73]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[74]  Matthew Stephens,et al.  Automating resequencing-based detection of insertion-deletion polymorphisms , 2006, Nature Genetics.

[75]  J. Hauser,et al.  Illness-specific association of val66met BDNF polymorphism with performance on Wisconsin Card Sorting Test in bipolar mood disorder , 2006, Molecular Psychiatry.

[76]  Elizabeth Pennisi On Your Mark. Get Set. Sequence! , 2006, Science.

[77]  H. Yamasue,et al.  No evidence for an association between the BDNF Val66Met polymorphism and schizophrenia or personality traits , 2006, Schizophrenia Research.

[78]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[79]  Jonathan C. Cohen,et al.  Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[80]  James A. Cuff,et al.  Distinguishing protein-coding and noncoding genes in the human genome , 2007, Proceedings of the National Academy of Sciences.

[81]  Jay Shendure,et al.  Multiplex amplification of large sets of human exons , 2007, Nature Methods.

[82]  R. Hayes,et al.  Functional Variant of Manganese Superoxide Dismutase (SOD2 V16A) Polymorphism Is Associated with Prostate Cancer Risk in the Prostate, Lung, Colorectal, and Ovarian Cancer Study , 2007, Cancer Epidemiology Biomarkers & Prevention.

[83]  David T. Okou,et al.  Microarray-based genomic selection for high-throughput resequencing , 2007, Nature Methods.

[84]  R. Marais,et al.  Melanoma biology and new targeted therapy , 2007, Nature.

[85]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[86]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[87]  Judy H Cho,et al.  Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis , 2007, Nature Genetics.

[88]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[89]  Kari Stefansson,et al.  A common variant on chromosome 9p21 affects the risk of myocardial infarction. , 2007, Science.

[90]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[91]  Colin N. Dewey,et al.  Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. , 2007, Genome research.

[92]  N. Blow Genomics: The personal side of genomics , 2007, Nature.

[93]  M. Olson Enrichment of super-sized resequencing targets from the human genome , 2007, Nature Methods.

[94]  Christian Gieger,et al.  Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions , 2007, Nature Genetics.

[95]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[96]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[97]  Gail Javitt,et al.  ASHG Statement* on Direct-to-Consumer Genetic Testing in the United States , 2007, Obstetrics and gynecology.

[98]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[99]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[100]  M J Wright,et al.  Effect of the BDNF V166M polymorphism on working memory in healthy adolescents , 2007, Genes, brain, and behavior.

[101]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[102]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[103]  Muin J Khoury,et al.  A critical appraisal of the scientific basis of commercial genomic profiles used to assess health risks and personalize health interventions. , 2008, American journal of human genetics.

[104]  Muin J. Khoury,et al.  Letting the genome out of the bottle--will we get our wish? , 2008, The New England journal of medicine.