Accelerated Gene Counting for Haplotype Frequency Estimation

Current implementations of the EM algorithm for estimating haplotype frequencies from genotypes on proximal loci require computational resources that grow as nh2k, where n is the number of individuals genotyped and h is the number of haplotypes possible on k loci. For diallelic loci hk= 2k. We present an approach whose computational requirement grows as n2t where t is the largest number of loci at which an individual in the sample is heterozygous. The method is illustrated by haplotype frequency estimation from a sample of 45 individuals genotyped at 26 single nucleotide polymorphisms in the PIK3R1 gene.