Genome-wide association analysis

Genome-wide association (GWA) studies have become overwhelmingly popular in the last few years as a means to elucidate associations between particular alleles in one’s DNA and a predisposition to disease, using genetic data from unrelated individuals randomly sampled from a population (Balding, 2006; WTCCC, 2007). The recent availability of large amounts of such population genetic data necessitates the need to efficiently and accurately test for diseasesusceptiblity loci in a computationally viable manner. Single-marker-analysis (SMA), i.e. testing each DNA marker for disease association without including any information from additional DNA markers, has thus far been the primary tool for many GWA studies. However, ignoring the information from neighboring markers may significantly decrease the power to find associated markers. In particular, it is well known in population genetics that allelic types from nearby markers can be correlated, a phenomenon known as linkage disequilibrium (LD). Scientists have tried to exploit ideas of LD to increase the power for finding markers associated with disease status. The power gained by considering multiple markers in such a manner versus using simple SMA has been demonstrated in much recent literature (Zöllner and Pritchard, 2005; Minichiello and Durbin, 2006; Mailund et al., 2006; Marchini et al., 2007). The thus far most powerful of these methods (Zöllner and Pritchard, 2005) has used the notion of the coalescent to capture LD. The coalescent is a welldeveloped model that captures how the DNA of “unrelated” individuals may nonetheless be correlated due to the individuals’ shared ancestral history far enough back in time. Such methods attempt to reconstruct the ancestral recombination graph (ARG) of a sample, which represents the complete evolutionary history of the DNA of all the individuals in the sample. If the ARG can be accurately reconstructed for any particular region of the genome, one can check if the ARG has a tendency to cluster disease individuals and non-disesase individuals separately, which would be indicative of a potential disease-associated locus (or loci) in the region. While the method of Zöllner and Pritchard (2005) has proven highly accurate towards achieving this goal, its techniques for reconstructing the ARG are computationally expensive, so that the method cannot be applied to the large amounts of available genetic data we have today. Therefore, several new methods have been recently developed that less rigorously reconstruct the ARG (Minichiello and Durbin, 2006; Mailund et al., 2006). Such methods attempt