Straightforward Inference of Ancestry and Admixture Proportions through Ancestry-Informative Insertion Deletion Multiplexing

Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control association studies, but also in population and forensic genetics studies. This work presents a set of 46 ancestry-informative insertion deletion polymorphisms selected to efficiently measure population admixture proportions of four different origins (African, European, East Asian and Native American). All markers are analyzed in short fragments (under 230 basepairs) through a single PCR followed by capillary electrophoresis (CE) allowing a very simple one tube PCR-to-CE approach. HGDP-CEPH diversity panel samples from the four groups, together with Oceanians, were genotyped to evaluate the efficiency of the assay in clustering populations from different continental origins and to establish reference databases. In addition, other populations from diverse geographic origins were tested using the HGDP-CEPH samples as reference data. The results revealed that the AIM-INDEL set developed is highly efficient at inferring the ancestry of individuals and provides good estimates of ancestry proportions at the population level. In conclusion, we have optimized the multiplexed genotyping of 46 AIM-INDELs in a simple and informative assay, enabling a more straightforward alternative to the commonly available AIM-SNP typing methods dependent on complex, multi-step protocols or implementation of large-scale genotyping technologies.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[3]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  N. Rosenberg distruct: a program for the graphical display of population structure , 2003 .

[6]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[7]  Jeremy Heil,et al.  Human diallelic insertion/deletion polymorphisms. , 2002, American journal of human genetics.

[8]  Ángel Carracedo,et al.  Ancestry Analysis in the 11-M Madrid Bomb Attack Investigation , 2009, PloS one.

[9]  Petros Drineas,et al.  Ancestry informative markers for fine-scale individual assignment to worldwide populations , 2010, Journal of Medical Genetics.

[10]  Ryan E. Mills,et al.  Small insertions and deletions (INDELs) in human genomes. , 2010, Human molecular genetics.

[11]  Francisco M De La Vega,et al.  Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples , 2011, Investigative Genetics.

[12]  A. Amorim,et al.  Assessing individual interethnic admixture and population substructure using a 48–insertion‐deletion (INSEL) ancestry‐informative marker (AIM) panel , 2010, Human mutation.

[13]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[14]  Ryan E. Mills,et al.  Natural genetic variation caused by small insertions and deletions in the human genome. , 2011, Genome research.

[15]  Gabriel Silva,et al.  Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America , 2009, Human mutation.

[16]  Á. Carracedo,et al.  Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. , 2007, Forensic science international. Genetics.

[17]  Xavier Estivill,et al.  SNPassoc: an R package to perform whole genome association studies , 2007, Bioinform..

[18]  Mark Shriver,et al.  A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications , 2008, Human mutation.

[19]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[20]  T. Frudakis,et al.  A classifier for the SNP-based inference of ancestry. , 2003, Journal of forensic sciences.

[21]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[22]  N. Rosenberg,et al.  Standardized Subsets of the HGDP‐CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives , 2006, Annals of human genetics.

[23]  Alkes L Price,et al.  Application of Ancestry Informative Markers to Association Studies in European Americans , 2008, PLoS genetics.

[24]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[25]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[26]  Peter Donnelly,et al.  Progress and challenges in genome-wide association studies in humans , 2008, Nature.

[27]  P. de Knijff,et al.  Developing a set of ancestry-sensitive DNA markers reflecting continental origins of humans , 2009, BMC Genetics.

[28]  Joel N Hirschhorn,et al.  Genome-wide association studies: results from the first few years and potential implications for clinical medicine. , 2011, Annual review of medicine.

[29]  Noah A. Rosenberg,et al.  CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure , 2007, Bioinform..

[30]  Hongzhe Li,et al.  Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine , 2005, Human Genetics.

[31]  Shameek Biswas,et al.  Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. , 2009, American journal of human genetics.

[32]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[33]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[34]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[35]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[36]  Manfred Kayser,et al.  Proportioning whole-genome single-nucleotide-polymorphism diversity for the identification of geographic population structure and genetic ancestry. , 2006, American journal of human genetics.

[37]  António Amorim,et al.  A new multiplex for human identification using insertion/deletion polymorphisms , 2009, Electrophoresis.

[38]  Á. Carracedo,et al.  A method for the analysis of 32 X chromosome insertion deletion polymorphisms in a single PCR , 2011, International Journal of Legal Medicine.

[39]  P. Gregersen,et al.  Accounting for ancestry: population substructure and genome-wide association studies. , 2008, Human molecular genetics.

[40]  Ann B. Lee,et al.  On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. , 2008, American journal of human genetics.

[41]  L. Jin,et al.  Ethnic-affiliation estimation by use of population-specific DNA markers. , 1997, American journal of human genetics.

[42]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[43]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[44]  D. Reich,et al.  Ancestry informative marker panels for African Americans based on subsets of commercially available SNP arrays , 2011, Genetic epidemiology.

[45]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[46]  Gabriel Silva,et al.  An ancestry informative marker set for determining continental origin: validation and extension using human genome diversity panels , 2009, BMC Genetics.

[47]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[48]  M. Feldman,et al.  Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure , 2005, PLoS genetics.

[49]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[50]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[51]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[52]  S. Madore,et al.  CoAIMs: A Cost-Effective Panel of Ancestry Informative Markers for Determining Continental Origins , 2010, PloS one.

[53]  Manfred Kayser,et al.  Improving human forensics through advances in genetics, genomics and molecular biology , 2011, Nature Reviews Genetics.

[54]  Peter M Vallone,et al.  Evaluating Self-declared Ancestry of U.S. Americans with Autosomal, Y-chromosomal and Mitochondrial DNA , 2010, Human mutation.

[55]  L. Excoffier,et al.  Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows , 2010, Molecular ecology resources.