Ancestry prediction efficiency of the software GenoGeographer using a z-score method and the ancestry informative markers in the Precision ID Ancestry Panel.

We compared the efficiency of the freely available software GenoGeographer that includes a z-score based analysis with that of a naïve method based on the maximal likelihoods of 164 of the 165 ancestral informative markers (AIM) that are included in the commercially available kit Precision ID Ancestry Panel from Thermo Fisher Scientific. The AIM profiles were obtained by investigations with the Precision ID Ancestry Panel in our laboratory and from SNP data in the literature and publically available databases. We established eight well-defined AIM reference population data sets from 3603 AIM profiles. Six reference populations with profiles from multiple populations (Sub-Saharan Africa, North Africa, Middle East, Europe, South/Central Asia, East Asia), and two populations with individuals with admixed ancestry (Somalia and Greenland). By means of GenoGeographer and naïve calculations of the maximal likelihoods, 566 AIM profiles from individuals that were not included in the reference populations and expected to belong to one of the eight reference populations were tested. An initial standard z-score based test with GenoGeographer demonstrated that 22.4% of the individuals could not be assigned to any of the reference populations. Among the remaining 77.6% of the individuals, 83.6% were assigned to the reference population that was concordant with the specified populations of origin of the individuals, 8.2% had ambiguous assignments because they could belong to both the specified population of origin and one or more of the other populations, and 8.2% were assigned to a reference population that was discordant from the specified population of origin. A naïve assignment based on the maximal likelihood resulted in 78.1% concordant and 21.9% discordant assignments. The results demonstrate that the z-score analysis with GenoGeographer can reduce the error rate with a factor of almost three compared with that of the naïve estimation based on the maximal likelihoods of the AIM profiles. The Precision ID Ancestry Panel is a useful kit for the assignment of ancestry of the eight investigated populations that included two admixed populations. More AIMs with better discrimination and more data on the distribution of AIMs in relevant populations are needed to improve the efficiency of genogeographic prediction with AIMs on a worldwide basis.

[1]  T Egeland,et al.  Inferring the Most Likely Geographical Origin of mtDNA Sequence Profiles , 2004, Annals of human genetics.

[2]  Bruce Budowle,et al.  Increasing the reference populations for the 55 AISNP panel: the need and benefits , 2017, International Journal of Legal Medicine.

[3]  Francisco M De La Vega,et al.  Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples , 2011, Investigative Genetics.

[4]  Niklaus J Grünwald,et al.  vcfr: a package to manipulate and visualize variant call format data in R , 2017, Molecular ecology resources.

[5]  A. Amorim,et al.  Assessing individual interethnic admixture and population substructure using a 48–insertion‐deletion (INSEL) ancestry‐informative marker (AIM) panel , 2010, Human mutation.

[6]  T. Hansen,et al.  Uncovering the Genetic History of the Present-Day Greenlandic Population , 2014, American journal of human genetics.

[7]  M. R. Srinivasan,et al.  Evaluation of standard error and confidence interval of estimated multilocus genotype probabilities, and their implications in DNA forensics. , 1993, American journal of human genetics.

[8]  Dennis McNevin,et al.  Assessment of the Precision ID Ancestry panel , 2018, International Journal of Legal Medicine.

[9]  Noah A. Rosenberg,et al.  A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations , 2012, PLoS genetics.

[10]  R. Mägi,et al.  A Selective Sweep on a Deleterious Mutation in CPT1A in Arctic Populations. , 2014, American journal of human genetics.

[11]  F. Balloux,et al.  Geography is a better determinant of human genetic differentiation than ethnicity , 2005, Human Genetics.

[12]  W R Mayr,et al.  DNA commission of the International Society of Forensic Genetics: Recommendations on the interpretation of mixtures. , 2006, Forensic science international.

[13]  Gabriel Silva,et al.  Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America , 2009, Human mutation.

[14]  A. Amorim,et al.  The peopling of Greenland: further insights from the analysis of genetic diversity using autosomal and X-chromosomal markers , 2014, European Journal of Human Genetics.

[15]  Ángel Carracedo,et al.  Ancestry Analysis in the 11-M Madrid Bomb Attack Investigation , 2009, PloS one.

[16]  P. Wiegand,et al.  Population genetic diversity in relation to microsatellite heterogeneity , 1998, Human mutation.

[17]  Chris Phillips,et al.  Forensic genetic analysis of bio-geographical ancestry. , 2015, Forensic science international. Genetics.

[18]  N. Fernandopulle,et al.  Implementing a biogeographic ancestry inference service for forensic casework , 2018, Electrophoresis.

[19]  Niels Morling,et al.  Paternity Testing Commission of the International Society of Forensic Genetics: recommendations on genetic investigations in paternity cases. , 2002, Forensic science international.

[20]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[21]  B. Ludes,et al.  Case report: on the use of the HID-Ion AmpliSeq™ Ancestry Panel in a real forensic case , 2017, International Journal of Legal Medicine.

[22]  Noah A. Rosenberg,et al.  CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure , 2007, Bioinform..

[23]  H. S. Mogensen,et al.  Evaluation of the Ion Torrent™ HID SNP 169-plex: A SNP typing assay developed for human identification by second generation sequencing. , 2014, Forensic science international. Genetics.

[24]  T. Tvedebrink,et al.  Weight of the evidence of genetic investigations of ancestry informative markers. , 2018, Theoretical population biology.

[25]  António Amorim,et al.  Straightforward Inference of Ancestry and Admixture Proportions through Ancestry-Informative Insertion Deletion Multiplexing , 2012, PloS one.

[26]  C. Phillips,et al.  Inferring biogeographic ancestry with compound markers of slow and fast evolving polymorphisms , 2018, European Journal of Human Genetics.

[27]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[28]  Á. Carracedo,et al.  Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. , 2007, Forensic science international. Genetics.

[29]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[30]  Niels Morling,et al.  Next generation sequencing and its applications in forensic genetics. , 2015, Forensic science international. Genetics.

[31]  Niels Morling,et al.  Evaluation of the Precision ID Ancestry Panel for crime case work: A SNP typing assay developed for typing of 165 ancestral informative markers. , 2017, Forensic science international. Genetics.

[32]  N. Morling,et al.  Frequencies of HID-ion ampliseq ancestry panel markers among greenlanders. , 2016, Forensic science international. Genetics.

[33]  F. Rousset,et al.  AN EXACT TEST FOR POPULATION DIFFERENTIATION , 1995, Evolution; international journal of organic evolution.

[34]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[35]  C. Mulligan,et al.  Early Back-to-Africa Migration into the Horn of Africa , 2014, PLoS genetics.

[36]  S. Pääbo,et al.  Evidence for gradients of human genetic diversity within and among continents. , 2004, Genome research.

[37]  Hongzhe Li,et al.  Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine , 2005, Human Genetics.

[38]  52 additional reference population samples for the 55 AISNP panel. , 2015, Forensic science international. Genetics.

[39]  T. Tvedebrink,et al.  Inference of admixed ancestry with Ancestry Informative Markers. , 2019, Forensic science international. Genetics.

[40]  K. Kidd,et al.  Progress toward an efficient panel of SNPs for ancestry inference. , 2014, Forensic science international. Genetics.

[41]  Chris Tyler-Smith,et al.  The human Y chromosome: an evolutionary marker comes of age , 2003, Nature Reviews Genetics.

[42]  Sreeurpa Ray,et al.  The Cell: A Molecular Approach , 1996 .

[43]  N. Rosenberg distruct: a program for the graphical display of population structure , 2003 .

[44]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[45]  D. F. Roberts,et al.  The History and Geography of Human Genes , 1996 .

[46]  Gabriel Silva,et al.  An ancestry informative marker set for determining continental origin: validation and extension using human genome diversity panels , 2009, BMC Genetics.

[47]  Michael C. Westaway,et al.  Genomic analyses inform on migration events during the peopling of Eurasia , 2016, Nature.

[48]  Kei-Hoi Cheung,et al.  ALFRED: an allele frequency database for diverse populations and DNA polymorphisms , 2000, Nucleic Acids Res..