SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes

The SNP500Cancer database provides sequence and genotype assay information for candidate SNPs useful in mapping complex diseases, such as cancer. The database is an integral component of the NCI Cancer Genome Anatomy Project (). SNP500Cancer reports sequence analysis of anonymized control DNA samples (n = 102 Coriell samples representing four self-described ethnic groups: African/African-American, Caucasian, Hispanic and Pacific Rim). The website is searchable by gene, chromosome, gene ontology pathway, dbSNP ID and SNP500Cancer SNP ID. As of October 2005, the database contains >13 400 SNPs, 9124 of which have been sequenced in the SNP500Cancer population. For each analysed SNP, gene location and >200 bp of surrounding annotated sequence (including nearby SNPs) are provided, with frequency information in total and per subpopulation as well as calculation of Hardy–Weinberg equilibrium for each subpopulation. The website provides the conditions for validated sequencing and genotyping assays, as well as genotype results for the 102 samples, in both viewable and downloadable formats. A subset of sequence validated SNPs with minor allele frequency >5% are entered into a high-throughput pipeline for genotyping analysis to determine concordance for the same 102 samples. In addition, the results of genotype analysis for select validated SNP assays (defined as 100% concordance between sequence analysis and genotype results) are posted for an additional 280 samples drawn from the Human Diversity Panel (HDP). SNP500Cancer provides an invaluable resource for investigators to select SNPs for analysis, design genotyping assays using validated sequence data, choose selected assays already validated on one or more genotyping platforms, and select reference standards for genotyping assays. The SNP500Cancer database is freely accessible via the web page at .

[1]  K. Buetow Cyberinfrastructure: Empowering a "Third Way" in Biomedical Research , 2005, Science.

[2]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[3]  Brian Staats,et al.  SNP500Cancer: a public resource for sequence validation and assay development for genetic variation in candidate genes , 2004, Nucleic Acids Res..

[4]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[5]  Bernice R. Packer,et al.  Widespread purifying selection at polymorphic sites in human protein-coding loci , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  S. Chanock,et al.  Sequence analysis of the mannose-binding lectin (MBL2) gene reveals a high degree of heterozygosity with evidence of selection , 2004, Genes and Immunity.

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[9]  Jeffrey Ross-Ibarra,et al.  Genetic Data Analysis II. Methods for Discrete Population Genentic Data , 2002 .

[10]  N. Rothman,et al.  Comparison of yield and genotyping performance of multiple displacement amplification and OmniPlex™ whole genome amplified DNA generated from multiple DNA sources , 2005, Human mutation.

[11]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[12]  Hugues Sicotte,et al.  Genewindow: an interactive tool for visualization of genomic variation , 2005, Nature Genetics.

[13]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[14]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[15]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[16]  Richard Wooster,et al.  Sequence-based cancer genomics: progress, lessons and opportunities , 2003, Nature Reviews Genetics.

[17]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[18]  Robert Welch,et al.  Effects of Natural Selection on Interpopulation Divergence at Polymorphic Sites in Human Protein-Coding Loci , 2005, Genetics.

[19]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[20]  Stephen J Chanock,et al.  Genetic variation, nucleotide diversity, and linkage disequilibrium in seven telomere stability genes suggest that these genes may be under constraint , 2005, Human mutation.