Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records

Purpose: The Vanderbilt DNA Databank (BioVU) is a biorepository that currently contains >80,000 DNA samples linked to electronic medical records. Although BioVU is a valuable source of samples and phenotypes for genetic association studies, it is unclear whether the administratively assigned race/ethnicity in BioVU can accurately describe and be used as a proxy for genetic ancestry.Methods: We genotyped 360 single nucleotide polymorphisms on the Illumina DNA Test Panel containing ancestry informative markers in 1910 BioVU samples with observer-reported ancestry and 384 samples from the Multiple Sclerosis Genetics Group with self-reported ancestry. Genetic ancestry was inferred for all individuals using Structure 2.2.Results: More than 98% of observer-reported European Americans were genetically inferred to have at least 60% European ancestry. Ninety-three percent of observer-reported African Americans were genetically inferred to be predominantly of African ancestry. We determined that the concordance of observer-reported race/ethnicity and inferred genetic ancestry was not significantly different from that of self-reported race/ethnicity in either population (P = 0.09 and 0.94 in European Americans and African Americans, respectively).Conclusions: Observer-reported race/ethnicity for European Americans and African Americans approximates genetic ancestry as well as self-reported race/ethnicity, making biorepositories linked to electronic medical records such as BioVU a viable source of DNA samples for future large-scale genetic association studies.

[1]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[2]  Jack R. Anderson Design and estimation for the National Health Interview Survey, 1995-2004. , 2000, Vital and health statistics. Series 2, Data evaluation and methods research.

[3]  Scott M. Williams,et al.  The Genetic Structure and History of Africans and African Americans , 2009, Science.

[4]  Scott M. Williams,et al.  A high-density admixture map for disease gene discovery in african americans. , 2004, American journal of human genetics.

[5]  Richard Shen,et al.  Medium- to high-throughput SNP genotyping using VeraCode microbeads. , 2009, Methods in molecular biology.

[6]  Mildred K Cho,et al.  Racial and Ethnic Categories in Biomedical Research: There is No Baby in the Bathwater , 2006, The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics.

[7]  Timothy J Wilt,et al.  Transition to the new race/ethnicity data collection standards in the Department of Veterans Affairs , 2006, Population health metrics.

[8]  Xiaofeng Zhu,et al.  Genetic Structure, Self-identified Race/ethnicity, and Confounding in Case-control Association Studies , 2022 .

[9]  Gabriel Silva,et al.  Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America , 2009, Human mutation.

[10]  Tatiana Foroud,et al.  False Positive Rates in Association Studies as a Function of Degree of Stratification , 2004, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[11]  J L Haines,et al.  Clinical characteristics of African Americans vs Caucasian Americans with multiple sclerosis , 2004, Neurology.

[12]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[13]  S. Redline,et al.  Self-reported race and genetic admixture. , 2006, The New England journal of medicine.

[14]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[15]  Scott M. Williams,et al.  Elevated male European and female African contributions to the genomes of African American individuals , 2006, Human Genetics.

[16]  Massey Jt,et al.  Design and estimation for the National Health Interview Survey 1985-94. , 1989 .

[17]  P. Gregersen,et al.  Accounting for ancestry: population substructure and genome-wide association studies. , 2008, Human molecular genetics.

[18]  E S Lander,et al.  Mapping complex genetic traits in humans: new methods using a complete RFLP linkage map. , 1986, Cold Spring Harbor symposia on quantitative biology.

[19]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[20]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[21]  C. McCarty,et al.  Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. , 2005, Personalized medicine.

[22]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[23]  Stephen L. Hauser,et al.  Genome-wide patterns of population structure and admixture in West Africans and African Americans , 2009, Proceedings of the National Academy of Sciences.

[24]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[25]  Matthew R Anderson,et al.  Validity of racial/ethnic classifications in medical records data: an exploratory study. , 2003, American journal of public health.

[26]  N. Risch,et al.  The importance of race and ethnic background in biomedical research and clinical practice. , 2003, The New England journal of medicine.