Spatial localization of recent ancestors for admixed individuals

Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over non-model-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources across a geographic continuum. We devise efficient algorithms based on hidden Markov models to localize on a map the recent ancestors (e.g. grandparents) of admixed individuals, joint with assigning ancestry at each locus in the genome. We validate our methods using empirical data from individuals with mixed European ancestry from the POPRES study and show that our approach is able to localize their recent ancestors within an average of 470Km of the reported locations of their grandparents. Furthermore, simulations from real POPRES genotype data show that our method attains high accuracy in localizing recent ancestors of admixed individuals in Europe (an average of 550Km from their true location for localization of 2 ancestries in Europe, 4 generations ago). We explore the limits of ancestry localization under our approach and find that performance decreases as the number of distinct ancestries and generations since admixture increases. Finally, we build a map of expected localization accuracy across admixed individuals according to the location of origin within Europe of their ancestors. Author Summary Inferring ancestry from genetic data forms a fundamental problem with applications ranging from localizing disease genes to inference of human history. Recent approaches have introduced models of genetic variation as a function of geography and have shown that such models yield high accuracies in ancestry inference from genetic data. In this work we propose methods for modeling the mixing of genetic data from different sources (i.e. admixture process) in a genetic-geographic continuum and show that using these methods we can accurately infer the ancestry of the recent ancestors (e.g. grandparents) from genetic data.

[1]  S. Gravel Population Genetics Models of Local Ancestry , 2012, Genetics.

[2]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[3]  E. Xing,et al.  mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations , 2009, Genetics.

[4]  G. McVean A Genealogical Interpretation of Principal Components Analysis , 2009, PLoS genetics.

[5]  D. Nickerson,et al.  Tracing Sub-Structure in the European American Population with PCA-Informative Markers , 2008, PLoS genetics.

[6]  Ingo Ruczinski,et al.  Recombination rates in admixed individuals identified by ancestry-based inference , 2011, Nature Genetics.

[7]  Jake K. Byrnes,et al.  PCAdmix: Principal Components-Based Assignment of Ancestry Along Each Chromosome in Individuals with Admixed Ancestry from Two or More Populations , 2012, Human biology.

[8]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[9]  Larsson Omberg,et al.  Patterns of Ancestry, Signatures of Natural Selection, and Genetic Association with Stature in Western African Pygmies , 2012, PLoS genetics.

[10]  F. Ayala,et al.  Genome-wide Patterns of Population Structure and Admixture Among Hispanic/Latino Populations , 2010 .

[11]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[12]  John Novembre,et al.  The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. , 2008, American journal of human genetics.

[13]  Nicholas A. Johnson,et al.  Ancestral Components of Admixed Genomes in a Mexican Cohort , 2011, PLoS genetics.

[14]  D. Absher,et al.  Characterizing the admixed African ancestry of African Americans , 2009, Genome Biology.

[15]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[16]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[17]  Gary K. Chen,et al.  Correction: Identification, Replication, and Fine-Mapping of Loci Associated with Adult Height in Individuals of African Ancestry , 2011, PLoS Genetics.

[18]  Xiaofeng Zhu,et al.  The landscape of recombination in African Americans , 2011, Nature.

[19]  A. Price,et al.  New approaches to disease mapping in admixed populations , 2011, Nature Reviews Genetics.

[20]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[21]  M. Daly,et al.  Methods for high-density admixture mapping of disease genes. , 2004, American journal of human genetics.

[22]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[23]  Pedro C. Avila,et al.  Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation , 2013, Bioinform..

[24]  A. Wieczorek,et al.  Fine‐scale spatial genetic structure and dispersal among spotted salamander (Ambystoma maculatum) breeding populations , 2006, Molecular ecology.

[25]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[26]  Eran Halperin,et al.  A model-based approach for analysis of spatial structure in genetic data , 2012, Nature Genetics.

[27]  Petros Drineas,et al.  Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers , 2010, PloS one.

[28]  E. Boerwinkle,et al.  Genome-wide distribution of ancestry in Mexican Americans , 2008, Human Genetics.

[29]  Eran Halperin,et al.  Enhanced localization of genetic samples through linkage-disequilibrium correction. , 2013, American journal of human genetics.

[30]  Yusuke Nakamura,et al.  Genetic variants associated with warfarin dose in African-American individuals: a genome-wide association study , 2013, The Lancet.

[31]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[32]  Zachary A. Szpiech,et al.  Genome-wide association studies in diverse populations , 2010, Nature Reviews Genetics.

[33]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[34]  Pablo Villoslada,et al.  European Population Substructure: Clustering of Northern and Southern Populations , 2006, PLoS genetics.

[35]  D. Reich,et al.  Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations , 2009, PLoS genetics.

[36]  Pedro C. Avila,et al.  Fast and accurate inference of local ancestry in Latino populations , 2012, Bioinform..

[37]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Michael W. Mahoney,et al.  PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[40]  M. Stephens,et al.  Using DNA to track the origin of the largest ivory seizure since the 1989 trade ban , 2007, Proceedings of the National Academy of Sciences.

[41]  Eran Halperin,et al.  Inference of locus-specific ancestry in closely related populations , 2009, Bioinform..