Population specific reference panels are crucial for the genetic analyses of Native Hawai’ians: an example of the CREBRF locus

Statistical imputation applied to genome-wide array data is the most cost-effective approach to complete the catalog of genetic variation in a study population. However, imputed genotypes in underrepresented populations incur greater inaccuracies due to ascertainment bias and a lack of representation among reference individuals,, further contributing to the obstacles to study these populations. Here we examined the consequences due to the lack of representation by genotyping a functionally important, Polynesian-specific variant, rs373863828, in the CREBRF gene, in a large number of self-reported Native Hawai’ians (N=3,693) from the Multiethnic Cohort. We found the derived allele of rs373863828 was significantly associated with several adiposity traits with large effects (e.g. 0.214 s.d., or approximately 1.28 kg/m2, per allele, in BMI as the most significant; P = 7.5×10−5). Due to the current absence of Polynesian representation in publicly accessible reference sequences, rs373863828 or any of its proxies could not be tested through imputation using these existing resources. Moreover, the association signals at this Polynesian-specific variant could not be captured by alternative approaches, such as admixture mapping. In contrast, highly accurate imputation can be achieved even if a small number (<200) of Polynesian reference individuals were available. By constructing an internal set of Polynesian reference individuals, we were able to increase sample size for analysis up to 3,936 individuals, and improved the statistical evidence of association (e.g. p = 1.5×10−7, 3×10−6, and 1.4×10−4 for BMI, hip circumference, and T2D, respectively). Taken together, our results suggest the alarming possibility that lack of representation in reference panels would inhibit discovery of functionally important, population-specific loci such as CREBRF. Yet, they could be easily detected and prioritized with improved representation of diverse populations in sequencing studies.

[1]  Alicia R. Martin,et al.  Clinical use of current polygenic risk scores may exacerbate health disparities , 2019, Nature Genetics.

[2]  R. Hanson,et al.  Association of CREBRF variants with obesity and diabetes in Pacific Islanders from Guam and Saipan , 2019, Diabetologia.

[3]  Gary K. Chen,et al.  Population Genetic Structure and Origins of Native Hawaiians in the Multiethnic Cohort Study , 2012, PloS one.

[4]  D O Stram,et al.  A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. , 2000, American journal of epidemiology.

[5]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[6]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[7]  Swapan Mallick,et al.  Insights into human genetic variation and population history from 929 diverse genomes , 2019, Science.

[8]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[9]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[10]  Jane E. Carpenter,et al.  A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. , 2012, Human molecular genetics.

[11]  D. Weeks,et al.  Discordant association of the CREBRF rs373863828 A allele with increased BMI and protection from type 2 diabetes in Māori and Pacific (Polynesian) people living in Aotearoa/New Zealand , 2018, Diabetologia.

[12]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[13]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[14]  E. Green,et al.  Prioritizing diversity in human genomics research , 2017, Nature Reviews Genetics.

[15]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[16]  K. Yano,et al.  Relationship of blood pressure with degree of Hawaiian ancestry. , 2002, Ethnicity & disease.

[17]  T. Inaoka,et al.  Association study of CREBRF missense variant (rs373863828:G > A; p.Arg457Gln) with levels of serum lipid profile in the Pacific populations , 2018, Annals of human biology.

[18]  Francisco M. De La Vega,et al.  Genomics for the world , 2011, Nature.

[19]  Bruce S Weir,et al.  Model-free Estimation of Recent Genetic Relatedness. , 2016, American journal of human genetics.

[20]  Carl P. Lipo,et al.  High-precision radiocarbon dating shows recent and rapid initial human colonization of East Polynesia , 2010, Proceedings of the National Academy of Sciences.

[21]  B. Henderson,et al.  Diabetes prevalence and body mass index differ by ethnicity: the Multiethnic Cohort. , 2009, Ethnicity & disease.

[22]  Christian Gieger,et al.  Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits , 2018, Nature Genetics.

[23]  Maureen T. Barnes,et al.  Heart Diseases Among Native Hawaiians and Pacific Islanders , 2014 .

[24]  R. Nielsen,et al.  On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure , 2014, Molecular biology and evolution.

[25]  T. Inaoka,et al.  A missense variant, rs373863828-A (p.Arg457Gln), of CREBRF and body mass index in Oceanic populations , 2017, Journal of Human Genetics.

[26]  Scott M. Williams,et al.  The Missing Diversity in Human Genetic Studies , 2019, Cell.

[27]  Alan M. Kwong,et al.  A reference panel of 64,976 haplotypes for genotype imputation , 2015, Nature Genetics.

[28]  Zachary A. Szpiech,et al.  selscan: An Efficient Multithreaded Program to Perform EHH-Based Scans for Positive Selection , 2014, Molecular biology and evolution.

[29]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[30]  S. Fullerton,et al.  Genomics is failing on diversity , 2016, Nature.

[31]  A. Khera,et al.  Dysfunctional adiposity and the risk of prediabetes and type 2 diabetes in obese adults. , 2012, JAMA.

[32]  L. Le Marchand,et al.  Propensity for Intra-abdominal and Hepatic Adiposity Varies Among Ethnic Groups. , 2019, Gastroenterology.

[33]  D. Weeks,et al.  A thrifty variant in CREBRF strongly influences body mass index in Samoans , 2016, Nature Genetics.

[34]  Masafumi Matsuda,et al.  Metabolic effects of visceral fat accumulation in type 2 diabetes. , 2002, The Journal of clinical endocrinology and metabolism.

[35]  D. V. Berg,et al.  Trans-ethnic genome-wide association study of colorectal cancer identifies a new susceptibility locus in VTI1A , 2014, Nature Communications.

[36]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[37]  S. Berry,et al.  Re: “Widespread prevalence of a CREBRF variant amongst Māori and Pacific children is associated with weight and height in early childhood” , 2018, International Journal of Obesity.

[38]  Timothy A Thornton,et al.  Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness , 2015, Genetic epidemiology.

[39]  S. Berry,et al.  Widespread prevalence of a CREBRF variant amongst Māori and Pacific children is associated with weight and height in early childhood , 2018, International Journal of Obesity.

[40]  D. Weeks,et al.  A missense variant in CREBRF is associated with taller stature in Samoans , 2019, bioRxiv.

[41]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[42]  E. C. Nordyke The peopling of Hawaiʿi , 1989 .

[43]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[44]  C. Amos,et al.  Novel Association of Genetic Markers Affecting CYP2A6 Activity and Lung Cancer Risk. , 2016, Cancer research.

[45]  Brian E. Cade,et al.  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2019, Nature.

[46]  Scott M. Williams,et al.  The Missing Diversity in Human Genetic Studies , 2019, Cell.

[47]  S. Olson,et al.  FOSSIL EVIDENCE FOR A DIVERSE BIOTA FROM KAUA‘I AND ITS TRANSFORMATION SINCE HUMAN ARRIVAL , 2001 .

[48]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[49]  Stephanie A. Bien,et al.  Genetic analyses of diverse populations improves discovery for complex traits , 2019, Nature.