Efficient discovery of single-nucleotide polymorphisms in coding regions of human genes

Single nucleotide polymorphisms in protein coding regions (cSNPs) are of great interest for their effects on phenotype and potential for mapping disease genes. We have identified 5400 novel exonic SNPs from alignments of public EST data to the draft human genome sequence, and approximately 12 000 more novel exonic SNPs from EST cluster alignments. We found 82% of the genomic-aligned SNPs and 63% of the EST-only SNPs to be detectably polymorphic in 20 Finnish DNA samples. 37% of the SNPs mapped to known protein coding regions, yielding 6500 distinct, novel cSNPs from the two datasets. These data reveal selection against mutations that alter protein structure, and distinct classes of genes under strongly positive vs. negative pressure from natural selection for amino acid replacement (detected by KA/KSratio). We have searched these cSNPs for compatibility with the amino acid profile at each site and structural impact on protein core stability.

[1]  C. Luo,et al.  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. , 1985, Molecular biology and evolution.

[2]  P. Parham,et al.  Nature of polymorphism in HLA-A, -B, and -C molecules. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  L. Strong,et al.  Germ line p53 mutations in a familial syndrome of breast cancer, sarcomas, and other neoplasms. , 1990, Science.

[4]  M. Levitt,et al.  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core , 1991, Nature.

[5]  F. Collins,et al.  Cystic fibrosis: molecular biology and therapeutic implications. , 1992, Science.

[6]  C. Lee,et al.  Predicting protein mutant energetics by self-consistent ensemble optimization. , 1994, Journal of molecular biology.

[7]  Lee Testing homology modeling on mutant proteins: predicting structural and thermodynamic effects in the Ala98-->Val mutants of T4 lysozyme. , 1995, Folding & design.

[8]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[9]  D. Clayton,et al.  Common BRCA1 variants and susceptibility to breast and ovarian cancer in the general population. , 1997, Human molecular genetics.

[10]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[11]  A. Hughes,et al.  Natural selection at major histocompatibility complex loci of vertebrates. , 1998, Annual review of genetics.

[12]  N. Shen,et al.  Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis , 1999, Nature Genetics.

[13]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.

[14]  E. Lander,et al.  Characterization of single-nucleotide polymorphisms in coding regions of human genes , 1999 .

[15]  Gabor T. Marth,et al.  A general approach to single-nucleotide polymorphism discovery , 1999, Nature Genetics.

[16]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[17]  K. Sirotkin,et al.  dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. , 1999, Genome research.

[18]  P Bork,et al.  SNP frequencies in human genes an excess of rare alleles and differing modes of selection. , 2000, Trends in genetics : TIG.

[19]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[20]  Christopher J. Lee,et al.  Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences , 2000, Nature Genetics.

[21]  Gerald J. Wyckoff,et al.  Rapid evolution of male reproductive genes in the descent of man , 2000, Nature.

[22]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[23]  Justin C. Fay,et al.  Positive and negative selection on the human genome. , 2001, Genetics.

[24]  Christopher J. Lee,et al.  The GeneMine system for genome/proteome annotation and collaborative data mining , 2001, IBM Syst. J..

[25]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[26]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[27]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[28]  David A Liberles,et al.  The Adaptive Evolution Database (TAED) , 2001, Genome Biology.

[29]  J Licinio,et al.  Single nucleotide polymorphism identification in candidate gene systems of obesity , 2001, The Pharmacogenomics Journal.

[30]  Christopher J. Lee,et al.  Genome-wide detection of alternative splicing in expressed sequences of human genes , 2001, Nucleic Acids Res..

[31]  Pui-Yan Kwok,et al.  Single-nucleotide polymorphisms in the public domain: how useful are they? , 2001, Nature Genetics.

[32]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[33]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[34]  Doug Brutlag,et al.  Multiple Sequence Alignment Multiple Sequence Alignment , 2003 .