Spatial Localization of Recent Ancestors for Admixed Individuals

Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over nonmodel-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources across a geographic continuum. We devise efficient algorithms based on hidden Markov models to localize on a map the recent ancestors (e.g., grandparents) of admixed individuals, joint with assigning ancestry at each locus in the genome. We validate our methods by using empirical data from individuals with mixed European ancestry from the Population Reference Sample study and show that our approach is able to localize their recent ancestors within an average of 470 km of the reported locations of their grandparents. Furthermore, simulations from real Population Reference Sample genotype data show that our method attains high accuracy in localizing recent ancestors of admixed individuals in Europe (an average of 550 km from their true location for localization of two ancestries in Europe, four generations ago). We explore the limits of ancestry localization under our approach and find that performance decreases as the number of distinct ancestries and generations since admixture increases. Finally, we build a map of expected localization accuracy across admixed individuals according to the location of origin within Europe of their ancestors.

[1]  D. Absher,et al.  Characterizing the admixed African ancestry of African Americans , 2009, Genome Biology.

[2]  Jake K. Byrnes,et al.  PCAdmix: Principal Components-Based Assignment of Ancestry Along Each Chromosome in Individuals with Admixed Ancestry from Two or More Populations , 2012, Human biology.

[3]  D. Nickerson,et al.  Tracing Sub-Structure in the European American Population with PCA-Informative Markers , 2008, PLoS genetics.

[4]  D. Reich,et al.  Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations , 2009, PLoS genetics.

[5]  A. Price,et al.  New approaches to disease mapping in admixed populations , 2011, Nature Reviews Genetics.

[6]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[7]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[8]  Matthew Stephens,et al.  Assigning African elephant DNA to geographic region of origin: applications to the ivory trade. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  G. McVean A Genealogical Interpretation of Principal Components Analysis , 2009, PLoS genetics.

[10]  John Novembre,et al.  The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. , 2008, American journal of human genetics.

[11]  AURÉLIE COULON,et al.  Statistical methods in spatial genetics , 2009, Molecular ecology.

[12]  Eran Halperin,et al.  Inference of locus-specific ancestry in closely related populations , 2009, Bioinform..

[13]  Eran Halperin,et al.  Enhanced localization of genetic samples through linkage-disequilibrium correction. , 2013, American journal of human genetics.

[14]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[15]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[16]  Zachary A. Szpiech,et al.  Genome-wide association studies in diverse populations , 2010, Nature Reviews Genetics.

[17]  Ingo Ruczinski,et al.  Recombination rates in admixed individuals identified by ancestry-based inference , 2011, Nature Genetics.

[18]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[19]  Nicholas A. Johnson,et al.  Ancestral Components of Admixed Genomes in a Mexican Cohort , 2011, PLoS genetics.

[20]  Xiaofeng Zhu,et al.  The landscape of recombination in African Americans , 2011, Nature.

[21]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[22]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[23]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[24]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[25]  L. Waits,et al.  Landscape genetics: where are we now? , 2010, Molecular ecology.

[26]  E. Xing,et al.  mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations , 2009, Genetics.

[27]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[28]  Larsson Omberg,et al.  Patterns of Ancestry, Signatures of Natural Selection, and Genetic Association with Stature in Western African Pygmies , 2012, PLoS genetics.

[29]  Ajay K. Royyuru,et al.  Geographic population structure analysis of worldwide human populations infers their biogeographical origins , 2014, Nature Communications.

[30]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[31]  Michael W. Mahoney,et al.  PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[32]  Pablo Villoslada,et al.  European Population Substructure: Clustering of Northern and Southern Populations , 2006, PLoS genetics.

[33]  Pedro C. Avila,et al.  Fast and accurate inference of local ancestry in Latino populations , 2012, Bioinform..

[34]  Jake K. Byrnes,et al.  Reconstructing the Population Genetic History of the Caribbean , 2013, PLoS genetics.

[35]  Eran Halperin,et al.  A model-based approach for analysis of spatial structure in genetic data , 2012, Nature Genetics.

[36]  Petros Drineas,et al.  Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers , 2010, PloS one.

[37]  E. Boerwinkle,et al.  Genome-wide distribution of ancestry in Mexican Americans , 2008, Human Genetics.

[38]  M. Daly,et al.  Methods for high-density admixture mapping of disease genes. , 2004, American journal of human genetics.

[39]  Gary K. Chen,et al.  Identification, Replication, and Fine-Mapping of Loci Associated with Adult Height in Individuals of African Ancestry , 2011, PLoS genetics.

[40]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[41]  Yusuke Nakamura,et al.  Genetic variants associated with warfarin dose in African-American individuals: a genome-wide association study , 2013, The Lancet.

[42]  H. Ostrer,et al.  Genome-wide patterns of population structure and admixture among Hispanic/Latino populations , 2010, Proceedings of the National Academy of Sciences.

[43]  M. Stephens,et al.  Using DNA to track the origin of the largest ivory seizure since the 1989 trade ban , 2007, Proceedings of the National Academy of Sciences.

[44]  S. Gravel Population Genetics Models of Local Ancestry , 2012, Genetics.

[45]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[46]  Pedro C. Avila,et al.  Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation , 2013, Bioinform..

[47]  A. Wieczorek,et al.  Fine‐scale spatial genetic structure and dispersal among spotted salamander (Ambystoma maculatum) breeding populations , 2006, Molecular ecology.