ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations

Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Terence P. Speed,et al.  Genome analysis A genotype calling algorithm for affymetrix SNP arrays , 2005 .

[2]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) , 2002, Science.

[3]  Amanda J. Garris,et al.  Genetic Structure and Diversity in Oryza sativa L. , 2005, Genetics.

[4]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[5]  Amanda J. Garris,et al.  Genetic structure and diversity in Oryza sativa , 2004 .

[6]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[7]  Yueming Ding,et al.  A customized and versatile high-density genotyping array for the mouse , 2009, Nature Methods.

[8]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[9]  Kenneth L. McNally,et al.  Genomewide SNP variation reveals relationships among landraces and modern varieties of rice , 2009, Proceedings of the National Academy of Sciences.

[10]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[11]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[12]  Peter J. Bradbury,et al.  The Genetic Architecture of Maize Flowering Time , 2009, Science.

[13]  D Bentley,et al.  Highly parallel SNP genotyping. , 2003, Cold Spring Harbor symposia on quantitative biology.

[14]  S. P. Fodor,et al.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays , 2004, Nature Methods.

[15]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[16]  Michael Inouye,et al.  A genotype calling algorithm for the Illumina BeadArray platform , 2007, Bioinform..

[17]  BRLMM : an Improved Genotype Calling Method for the GeneChip ® Human Mapping 500 K Array Set , 2006 .

[18]  Ulrich Broeckel,et al.  Genotyping platforms for mass-throughput genotyping with SNPs, including human genome-wide scans. , 2008, Advances in genetics.

[19]  A. Misra,et al.  SNP genotyping: technologies and biomedical applications. , 2007, Annual review of biomedical engineering.

[20]  Jing Huang,et al.  Algorithms for large-scale genotyping microarrays , 2003, Bioinform..

[21]  K. Bussell Signalling: Friendly rivalry , 2005, Nature Reviews Molecular Cell Biology.

[22]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[23]  M. McMullen,et al.  Genetic Design and Statistical Power of Nested Association Mapping in Maize , 2008, Genetics.

[24]  M. Olivier A haplotype map of the human genome , 2003, Nature.