Human Genome Variation: Haplotypes, Linkage Disequilibrium, and Populations - Session Introduction

The working draft of a reference sequence of the human genome is nearing completion and is providing a basis for studies in a variety of domains. Computational challenges exist in all of these domains because of the massive amounts of data, the multiple often complex relationships among the types of relevant data, and the need to make the data accessible to researchers approaching the data from different perspectives. One aspect of genomic data of broad relevance is the variation in the DNA sequence among the billions of separate copies existing in the several billion living humans. Some of that variation is "abnormal" and the basis for inherited diseases. However, most of the variation is normal and simply makes each of us unique. Yet this common, normal variation is of great biomedical relevance because it can alter disease susceptibility, physiologic reactions to drugs, and response to environmental stimulus. The variation is also relevant to anthropology and understanding human evolution. In fact, several aspects of DNA sequence variation are consequences of recent human evolution: the amount of variation, the distribution of variation among human populations, and the organization of variation along the DNA sequence. This last issue has become increasingly interesting as millions of single nucleotide polymorphisms (SNPs) have been identified and mapped. With multiple SNPs mapped to every small segment of DNA the focus has shifted from the individual SNP to considering groups of SNPs as haplotypes (haploid genotypes) with the common finding that for n SNPs in a small segment of DNA there are usually far fewer than the 2 n haplotypes expected by chance. This non-randomness, commonly referred to as linkage disequilibrium (LD), is adding an additional level of complexity to genetic databases and analytic programs.