Genome-wide association studies and human disease: from trickle to flood.

MANY COMMON COMPLEX DISEASES SUCH AS hypertension, diabetes, coronary heart disease, psychiatric disorders, and some cancers have a genetic etiology. Despite enormous efforts over the last few decades, little real progress was made in finding the genes and causal variants involved. Genome-wide association studies, in which hundreds of thousands of DNA markers are tested (usually in a casecontrol design) for association with disease, provide the first effective approach to search for genetic variants that contribute to the complex etiology of common human diseases. In the last 3 years, almost 1000 variants associated with a range of human traits and common diseases have been identified using genome-wide association methods (FIGURE). To date, most of these studies have been in populations of European descent. Genome-wide search strategies developed from advances in genotyping technology, greater understanding of the structure of common variation in the human genome, and continued advances in computing power and software tools. Commercial genotyping platforms can type as many as 1 million single-nucleotide polymorphisms (SNPs) on a single chip, capturing (tagging) most variation between individuals in a single experiment. Instead of genotyping per sample the 10 million to 15 million common SNPs that segregate in the population for each sample, a much smaller subset of approximately 500 000 SNPs is sufficient to cover common variation in the genome. The flipside of this redundancy is that SNPs that are statistically associated with disease are unlikely to be causal and will be correlated with an ungenotyped causal variant. Genome-wide association studies have provided insights about disease, in particular: (1) for almost any disease that has been investigated, there are SNP variants common in the population (with an allele frequency 5%) that are robustly associated with disease; (2) most of these variants are in genes that contribute to biological pathways that were previously not known to be involved in disease or are nowhere near a known protein-coding gene; (3) the effect sizes of associated SNPs are typically small with odds ratios of risk alleles in the range of approximately 1.1 to 1.5; (4) for any particular disease, accumulating the effects of many different SNPs associated with a disease usually explains only a small fraction of the familial risk (or heritability); and (5) not all diseases and traits are alike in genetic architecture. For example, in age-related macular degeneration, approximately 50% of genetic variation has been accounted for by only 6 loci, whereas for adult height, only 6% of genetic variation has been accounted for by approximately 50 loci. In the iron homeostasis pathway, several common SNPs have been reported that each explain 5% or more of genetic variation. Failure to account for much of the genetic variation or “missing heritability” has divided the community with respect to the success or failure of genome-wide association studies. Since the common goal is to understand pathways to disease and develop improved methods of prevention, diagnosis, and treatment, it is important to understand what the current flood of associated SNPs reveals about disease biology and gene regulation, why so little genetic variation has been accounted for, and what experimental approaches might lead the identification of causal variants and mechanisms. Figure. The Genome-wide Association Revolution: From Trickle to Flood