Population specific reference panels are crucial for genetic analyses: an example of the CREBRF locus in Native Hawaiians.

Statistical imputation applied to genome-wide array data is the most cost-effective approach to complete the catalog of genetic variation in a study population. However, imputed genotypes in underrepresented populations incur greater inaccuracies due to ascertainment bias and a lack of representation among reference individuals, further contributing to the obstacles to study these populations. Here we examined the consequences due to the lack of representation by genotyping in a large number of self-reported Native Hawaiians (N = 3693) a functionally important, Polynesian-specific variant in the CREBRF gene, rs373863828. We found the derived allele was significantly associated with several adiposity traits with large effects (e.g. approximately 1.28 kg/m2 per allele in BMI as the most significant; P = 7.5x10-5), consistent with the original findings in Samoans. Due to the current absence of Polynesian representation in publicly accessible reference sequences, rs373863828 or its proxies could not be tested through imputation using these existing resources. Moreover, the association signals at the entire CREBRF locus could not be captured by alternative approaches, such as admixture mapping. In contrast, highly accurate imputation can be achieved even if a small number (<200) of internally constructed Polynesian reference individuals were available; this would increase sample size and improve the statistical evidence of associations. Taken together, our results suggest the alarming possibility that lack of representation in reference panels could inhibit discovery of functionally important loci such as CREBRF. Yet, they could be easily detected and prioritized with improved representation of diverse populations in sequencing studies.

[1]  C. Haiman,et al.  Genome‐Wide Association Study of Liver Fat: The Multiethnic Cohort Adiposity Phenotype Study , 2020, Hepatology communications.

[2]  R. Hanson,et al.  Association of CREBRF variants with obesity and diabetes in Pacific Islanders from Guam and Saipan , 2019, Diabetologia.

[3]  D. Weeks,et al.  A missense variant in CREBRF is associated with taller stature in Samoans , 2019, bioRxiv.

[4]  Swapan Mallick,et al.  Insights into human genetic variation and population history from 929 diverse genomes , 2019, Science.

[5]  Stephanie A. Bien,et al.  Genetic analyses of diverse populations improves discovery for complex traits , 2019, Nature.

[6]  Scott M. Williams,et al.  The Missing Diversity in Human Genetic Studies , 2019, Cell.

[7]  Alicia R. Martin,et al.  Clinical use of current polygenic risk scores may exacerbate health disparities , 2019, Nature Genetics.

[8]  L. Le Marchand,et al.  Propensity for Intra-abdominal and Hepatic Adiposity Varies Among Ethnic Groups. , 2019, Gastroenterology.

[9]  Christian Gieger,et al.  Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits , 2018, Nature Genetics.

[10]  D. Weeks,et al.  Discordant association of the CREBRF rs373863828 A allele with increased BMI and protection from type 2 diabetes in Māori and Pacific (Polynesian) people living in Aotearoa/New Zealand , 2018, Diabetologia.

[11]  T. Inaoka,et al.  Association study of CREBRF missense variant (rs373863828:G > A; p.Arg457Gln) with levels of serum lipid profile in the Pacific populations , 2018, Annals of human biology.

[12]  S. Berry,et al.  Re: “Widespread prevalence of a CREBRF variant amongst Māori and Pacific children is associated with weight and height in early childhood” , 2018, International Journal of Obesity.

[13]  E. Green,et al.  Prioritizing diversity in human genomics research , 2017, Nature Reviews Genetics.

[14]  T. Inaoka,et al.  A missense variant, rs373863828-A (p.Arg457Gln), of CREBRF and body mass index in Oceanic populations , 2017, Journal of Human Genetics.

[15]  S. Fullerton,et al.  Genomics is failing on diversity , 2016, Nature.

[16]  C. Amos,et al.  Novel Association of Genetic Markers Affecting CYP2A6 Activity and Lung Cancer Risk. , 2016, Cancer research.

[17]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[18]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[19]  D. Weeks,et al.  A thrifty variant in CREBRF strongly influences body mass index in Samoans , 2016, Nature Genetics.

[20]  Bruce S Weir,et al.  Model-free Estimation of Recent Genetic Relatedness. , 2016, American journal of human genetics.

[21]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[22]  Timothy A Thornton,et al.  Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness , 2015, Genetic epidemiology.

[23]  D. V. Berg,et al.  Trans-ethnic genome-wide association study of colorectal cancer identifies a new susceptibility locus in VTI1A , 2014, Nature Communications.

[24]  Zachary A. Szpiech,et al.  selscan: An Efficient Multithreaded Program to Perform EHH-Based Scans for Positive Selection , 2014, Molecular biology and evolution.

[25]  R. Nielsen,et al.  On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure , 2014, Molecular biology and evolution.

[26]  Maureen T. Barnes,et al.  Heart Diseases Among Native Hawaiians and Pacific Islanders , 2014 .

[27]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[28]  Jane E. Carpenter,et al.  A meta-analysis of genome-wide association studies of breast cancer identifies two novel susceptibility loci at 6q14 and 20q11. , 2012, Human molecular genetics.

[29]  Gary K. Chen,et al.  Population Genetic Structure and Origins of Native Hawaiians in the Multiethnic Cohort Study , 2012, PloS one.

[30]  A. Khera,et al.  Dysfunctional adiposity and the risk of prediabetes and type 2 diabetes in obese adults. , 2012, JAMA.

[31]  Noah A. Rosenberg,et al.  A Coalescent Model for Genotype Imputation , 2012, Genetics.

[32]  D. Juárez,et al.  Prevalence of Heart Disease and Its Risk Factors Related to Age in Asians, Pacific Islanders, and Whites in Hawai‘i , 2012, Journal of health care for the poor and underserved.

[33]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[34]  Francisco M. De La Vega,et al.  Genomics for the world , 2011, Nature.

[35]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[36]  Carl P. Lipo,et al.  High-precision radiocarbon dating shows recent and rapid initial human colonization of East Polynesia , 2010, Proceedings of the National Academy of Sciences.

[37]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[38]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[39]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[40]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[41]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[42]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[43]  Masafumi Matsuda,et al.  Metabolic effects of visceral fat accumulation in type 2 diabetes. , 2002, The Journal of clinical endocrinology and metabolism.

[44]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[45]  S. Olson,et al.  FOSSIL EVIDENCE FOR A DIVERSE BIOTA FROM KAUA‘I AND ITS TRANSFORMATION SINCE HUMAN ARRIVAL , 2001 .

[46]  D O Stram,et al.  A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. , 2000, American journal of epidemiology.

[47]  S. Berry,et al.  Widespread prevalence of a CREBRF variant amongst Māori and Pacific children is associated with weight and height in early childhood , 2018, International Journal of Obesity.

[48]  B. Henderson,et al.  Diabetes prevalence and body mass index differ by ethnicity: the Multiethnic Cohort. , 2009, Ethnicity & disease.

[49]  K. Yano,et al.  Relationship of blood pressure with degree of Hawaiian ancestry. , 2002, Ethnicity & disease.

[50]  E. C. Nordyke The peopling of Hawaiʿi , 1989 .