SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays

MOTIVATION The technology to genotype single nucleotide polymorphisms (SNPs) at extremely high densities provides for hypothesis-free genome-wide scans for common polymorphisms associated with complex disease. However, we find that some errors introduced by commonly employed genotyping algorithms may lead to inflation of false associations between markers and phenotype. RESULTS We have developed a novel SNP genotype calling program, SNiPer-High Density (SNiPer-HD), for highly accurate genotype calling across hundreds of thousands of SNPs. The program employs an expectation-maximization (EM) algorithm with parameters based on a training sample set. The algorithm choice allows for highly accurate genotyping for most SNPs. Also, we introduce a quality control metric for each assayed SNP, such that poor-behaving SNPs can be filtered using a metric correlating to genotype class separation in the calling algorithm. SNiPer-HD is superior to the standard dynamic modeling algorithm and is complementary and non-redundant to other algorithms, such as BRLMM. Implementing multiple algorithms together may provide highly accurate genotyping calls, without inflation of false positives due to systematically miss-called SNPs. A reliable and accurate set of SNP genotypes for increasingly dense panels will eliminate some false association signals and false negative signals, allowing for rapid identification of disease susceptibility loci for complex traits. AVAILABILITY SNiPer-HD is available at TGen's website: http://www.tgen.org/neurogenomics/data.

[1]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2]  David W Craig,et al.  Applications of whole-genome high-density SNP genotyping , 2005, Expert review of molecular diagnostics.

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  A Chakravarti,et al.  High-throughput variation detection and genotyping using microarrays. , 2001, Genome research.

[5]  Jing Huang,et al.  Algorithms for large-scale genotyping microarrays , 2003, Bioinform..

[6]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[7]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[8]  John V Pearson,et al.  SNiPer: Improved SNP genotype calling for Affymetrix 10K GeneChip microarray data , 2005, BMC Genomics.

[9]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[10]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[11]  G. Abecasis,et al.  A note on exact tests of Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[12]  Terence P. Speed,et al.  Genome analysis A genotype calling algorithm for affymetrix SNP arrays , 2005 .

[13]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[14]  Jing Huang,et al.  Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays , 2005, Bioinform..