Using Duplicate Genotyped Data in Genetic Analyses: Testing Association and Estimating Error Rates

Although researchers use duplicate genotyped data to calculate an inconsistency rate, there is no power analysis to assess the value of the duplicate data. In this paper, we present a model in which the genotyping error rate is related to the inconsistency rate. We extend the g genotype by h phenotype chi-squared test to incorporate the duplicate genotyped data. When a subject is inconsistently genotyped (that is, has two observed genotypes), our procedure is to allocate 0.5 units to each of the two genotypes. We specify the multivariate analysis of variance (MANOVA) test comparing these extended counts. We provide freely available software for this test and also for a permutation test used on small samples. A simulation study shows that the asymptotic null distribution of the MANOVA test holds when the total number of subjects, N, is at least 300. We also document with a simulation study that the asymptotic distribution of this test under various alternative hypotheses is a satisfactory approximation to the simulated power. In all cases, the power of the MANOVA test using the duplicate genotyped data is greater than the power of the chi-squared test ignoring the duplicate data. Power increases ranged from 0.776% to 4.652% for 80% powered tests and 0.292% to 2.028% for 95% powered tests. Researchers now can compute the value of the duplicate genotyped data as part of the design of the study.