Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm.

Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies.

[1]  Simon Cawley,et al.  Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. , 2011, Genomics.

[2]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[3]  M. Khoury,et al.  A navigator for human genome epidemiology , 2008, Nature Genetics.

[4]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[5]  Gonçalo Abecasis,et al.  Genotype-imputation accuracy across worldwide human populations. , 2009, American journal of human genetics.

[6]  R. Altman,et al.  PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. , 2005, Methods in molecular biology.

[7]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[8]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[9]  R. Mei,et al.  A genomewide admixture mapping panel for Hispanic/Latino populations. , 2007, American journal of human genetics.

[10]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[11]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[12]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[13]  N. Risch,et al.  Racial admixture and its impact on BMI and blood pressure in African and Mexican Americans , 2006, Human Genetics.

[14]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[15]  Dan M Roden,et al.  A rare variant in MYH6 is associated with high risk of sick sinus syndrome , 2011, Nature Genetics.

[16]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[17]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[18]  Rui Mei,et al.  Latino populations: a unique opportunity for the study of race, genetics, and social environment in epidemiological research. , 2005, American journal of public health.

[19]  K. Giacomini,et al.  The Pharmacogenomics of Membrane Transporters Project: Research at the Interface of Genomics and Transporter Pharmacology , 2010, Clinical pharmacology and therapeutics.

[20]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[21]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[22]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[23]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[24]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[25]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[26]  N. Risch,et al.  Differences in Albuminuria Between Hispanics and Whites: An Evaluation by Genetic Ancestry and Country of Origin: The Multi-Ethnic Study of Atherosclerosis , 2010, Circulation. Cardiovascular genetics.

[27]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[28]  J. Witte Genome-wide association studies and beyond. , 2010, Annual review of public health.

[29]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.