A comprehensive survey of genetic variation in 20 , 691 subjects from four large cohorts 1

The Nurses’ Health Study (NHS), Nurses’ Health Study II (NHSII), Health Professionals Follow Up Study (HPFS) and the Physicians Health Study (PHS) have collected detailed longitudinal data on multiple exposures and traits for approximately 310,000 study participants over the last 35 years. Over 160,000 study participants across the cohorts have donated a DNA sample and to date, 20,691 subjects have been genotyped as part of genome-wide association studies (GWAS) of twelve primary outcomes. However, these studies utilized six different GWAS arrays making it difficult to conduct analyses of secondary phenotypes or share controls across studies. To allow for secondary analyses of these data, we have created three new datasets merged by platform family and performed imputation using a common reference panel, the 1,000 Genomes Phase I release. Here, we describe the methodology behind the data merging and imputation and present imputation quality statistics and association results from two GWAS of secondary phenotypes (body mass index (BMI) and venous thromboembolism (VTE)). We observed the strongest BMI association for the FTO SNP rs55872725 (β=0.45, p=3.48×10−22), and using a significance level of p=0.05, we replicated 19 out of 32 known BMI SNPs. For VTE, we observed the strongest association for the rs2040445 SNP (OR=2.17, 95% CI: 1.79-2.63, p=2.70×10−15), located downstream of F5 and also observed significant associations for the known ABO and F11 regions. This pooled resource can be used to maximize power in GWAS of phenotypes collected across the cohorts and for studying gene-environment interactions as well as rare phenotypes and genotypes.

[1]  Andrew H. Beck,et al.  Erratum: Genome-wide association study identifies multiple loci associated with both mammographic density and breast cancer risk (Nature Communications (2014) 5:5303 (DOI:10.1038/ncomms6303)) , 2015 .

[2]  C. Sabatti,et al.  Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort , 2015, Genetics.

[3]  K. Hao,et al.  Meta-analysis of 65,734 individuals identifies TSPAN15 and SLC44A2 as two susceptibility loci for venous thromboembolism. , 2015, American journal of human genetics.

[4]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[5]  Audrey Y. Chu,et al.  FTO genetic variants, dietary intake and body mass index: insights from 177,330 individuals. , 2014, Human molecular genetics.

[6]  Heang-Ping Chan,et al.  Genome-wide association study identifies multiple loci associated with both mammographic density and breast cancer risk , 2022 .

[7]  J. J. Wang,et al.  Genome-wide meta-analysis identifies six novel loci associated with habitual coffee consumption , 2014, Molecular Psychiatry.

[8]  Peter Kraft,et al.  Elevated circulating branched chain amino acids are an early event in pancreatic adenocarcinoma development , 2014, Nature Medicine.

[9]  Audrey Y. Chu,et al.  Fried food consumption, genetic risk, and body mass index: gene-diet interaction analysis in three US cohort studies , 2014, BMJ : British Medical Journal.

[10]  D. V. Berg,et al.  Genome-wide association study of endometrial cancer in E2C2 , 2013, Human Genetics.

[11]  C. Friedenreich,et al.  Genome-wide association study of endometrial cancer in E2C2 , 2013, Human Genetics.

[12]  Teri A. Manolio,et al.  Bringing genome-wide association findings into clinical use , 2013, Nature Reviews Genetics.

[13]  Audrey Y. Chu,et al.  Gene × Physical Activity Interactions in Obesity: Combined Analysis of 111,421 Individuals of European Ancestry , 2013, PLoS genetics.

[14]  Peter Kraft,et al.  A Genome-Wide Association Study of Depressive Symptoms , 2013, Biological Psychiatry.

[15]  Simon G. Coetzee,et al.  Identification of Genetic Susceptibility Loci for Colorectal Tumors in a Genome-Wide Meta-analysis. , 2013, Gastroenterology.

[16]  L. Bierut,et al.  Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy , 2013, Human Genetics.

[17]  지선하 Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis , 2013 .

[18]  L. Gómez Morales [Sugar-sweetened beverages and genetic risk of obesity]. , 2013, Revista clinica espanola.

[19]  Xuehong Zhang,et al.  Postmenopausal plasma sex hormone levels and breast cancer risk over 20 years of follow-up , 2013, Breast Cancer Research and Treatment.

[20]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[21]  Melissa Bondy,et al.  Genome-wide association study of glioma and meta-analysis , 2012, Human Genetics.

[22]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[23]  David N. Rider,et al.  Identification of a novel percent mammographic density locus at 12q24. , 2012, Human molecular genetics.

[24]  S. Chanock,et al.  Genome-Wide Association Study of Circulating Estradiol, Testosterone, and Sex Hormone-Binding Globulin in Postmenopausal Women , 2012, PloS one.

[25]  Albert Hofman,et al.  How to deal with the early GWAS data when imputing and combining different arrays is necessary , 2011, European Journal of Human Genetics.

[26]  Stephanie Gogarten,et al.  Common variants near CAV1 and CAV2 are associated with primary open-angle glaucoma in Caucasians from the USA. , 2011, Human molecular genetics.

[27]  E. Rimm,et al.  Protein Interaction-Based Genome-Wide Analysis of Incident Coronary Heart Disease , 2011, Circulation. Cardiovascular genetics.

[28]  Toshiko Tanaka,et al.  Genome-wide association study of circulating retinol levels , 2011, Human molecular genetics.

[29]  P. Kraft,et al.  Artifact due to differential error when cases and controls are imputed from different platforms , 2011, Human Genetics.

[30]  S. Chanock,et al.  Genome-wide association study identifies common variants associated with circulating vitamin E levels , 2011, Human molecular genetics.

[31]  D. Chasman,et al.  Genome-Wide Association Study of Relative Telomere Length , 2011, PloS one.

[32]  M. Thun,et al.  Genome-wide association study identifies new prostate cancer susceptibility loci. , 2011, Human molecular genetics.

[33]  Jeffrey E. Lee,et al.  Genome-wide association study identifies nidogen 1 ( NID1 ) as a susceptibility locus to cutaneous nevi and melanoma risk , 2022 .

[34]  Peter Kraft,et al.  Common variants in ZNF365 are associated with both mammographic density and breast cancer risk , 2011, Nature Genetics.

[35]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[36]  Thomas Meitinger,et al.  Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution , 2010, Nature Genetics.

[37]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[38]  Peter Kraft,et al.  Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes. , 2010, Human molecular genetics.

[39]  E. Rimm,et al.  Genetic variants in ABO blood group region, plasma soluble E-selectin levels and risk of type 2 diabetes. , 2010, Human molecular genetics.

[40]  William Wheeler,et al.  Genome-wide association study of circulating vitamin D levels , 2010, Human molecular genetics.

[41]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[42]  P. Kraft,et al.  Genome‐wide association scans for secondary traits using case‐control samples , 2009, Genetic epidemiology.

[43]  S. Chanock,et al.  Genome-wide significant predictors of metabolites in the one-carbon metabolism pathway. , 2009, Human molecular genetics.

[44]  Geoffrey S. Tobias,et al.  Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer , 2009, Nature Genetics.

[45]  P. Ridker,et al.  Genome-wide association studies identify novel loci associated with age at menarche and age at natural menopause , 2009, Nature Genetics.

[46]  William Wheeler,et al.  Genome-Wide and Candidate Gene Association Study of Cigarette Smoking Behaviors , 2009, PloS one.

[47]  S. Chanock,et al.  Common variants of FUT2 are associated with plasma vitamin B12 levels , 2008, Nature Genetics.

[48]  Stephen Chanock,et al.  Population Substructure and Control Selection in Genome-Wide Association Studies , 2008, PloS one.

[49]  Subhajyoti De,et al.  Common variants near MC4R are associated with fat mass, weight and risk of obesity , 2008, Nature Genetics.

[50]  C. Gieger,et al.  Identification of ten loci associated with height highlights new biological pathways in human growth , 2008, Nature Genetics.

[51]  F. Hu,et al.  A Genome-Wide Association Study Identifies Novel Alleles Associated with Hair Color and Skin Pigmentation , 2008, PLoS genetics.

[52]  N. Cook,et al.  Rationale, design, and methodology of the Women's Genome Health Study: a genome-wide association study of more than 25,000 initially healthy american women. , 2008, Clinical chemistry.

[53]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[54]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[55]  S. Hankinson,et al.  Association between plasma prolactin concentrations and risk of breast cancer among predominately premenopausal women. , 2006, Cancer research.

[56]  J. Stockman,et al.  The Mortality of Doctors in Relation to Their Smoking Habits: A Preliminary Report , 2006 .

[57]  Graham A. Colditz,et al.  The Nurses' Health Study: lifestyle and health among women , 2005, Nature Reviews Cancer.

[58]  E. Rimm,et al.  Nutritional predictors of insulin-like growth factor I and their relationships to cancer in men. , 2003, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[59]  J. Buring,et al.  Comparison of baseline characteristics and mortality experience of participants and nonparticipants in a randomized clinical trial: the Physicians' Health Study. , 2002, Controlled clinical trials.

[60]  Rappold,et al.  Human Molecular Genetics , 1996, Nature Medicine.

[61]  J. Manson,et al.  Intake of trans fatty acids and risk of coronary heart disease among women , 1993, The Lancet.

[62]  W. Kannel,et al.  Factors of risk in the development of coronary heart disease--six year follow-up experience. The Framingham Study. , 1961, Annals of internal medicine.

[63]  H GRUNEBERG,et al.  Human genetics. , 1947, The Eugenics review.