Reconstructing Past Admixture Processes from Local Genomic Ancestry Using Wavelet Transformation

Admixture between long-separated populations is a defining feature of the genomes of many species. The mosaic block structure of admixed genomes can provide information about past contact events, including the time and extent of admixture. Here, we describe an improved wavelet-based technique that better characterizes ancestry block structure from observed genomic patterns. principal components analysis is first applied to genomic data to identify the primary population structure, followed by wavelet decomposition to develop a new characterization of local ancestry information along the chromosomes. For testing purposes, this method is applied to human genome-wide genotype data from Indonesia, as well as virtual genetic data generated using genome-scale sequential coalescent simulations under a wide range of admixture scenarios. Time of admixture is inferred using an approximate Bayesian computation framework, providing robust estimates of both admixture times and their associated levels of uncertainty. Crucially, we demonstrate that this revised wavelet approach, which we have released as the R package adwave, provides improved statistical power over existing wavelet-based techniques and can be used to address a broad range of admixture questions.

[1]  Jack N. Fenner,et al.  Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. , 2005, American journal of physical anthropology.

[2]  M. Stephens,et al.  Interpreting principal component analyses of spatial population genetic variation , 2008, Nature Genetics.

[3]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[4]  H. Ostrer,et al.  The History of African Gene Flow into Southern Europeans, Levantines, and Jews , 2011, PLoS genetics.

[5]  Michael I. Jordan,et al.  On the Inference of Ancestries in Admixed Populations , 2008, RECOMB.

[6]  J. Hearn,et al.  ABC inference of multi-population divergence with admixture from unphased population genomic data , 2014, Molecular ecology.

[7]  Mark A. Beaumont,et al.  Approximate Bayesian Computation Without Summary Statistics: The Case of Admixture , 2009, Genetics.

[8]  R. Nielsen,et al.  Inference of Historical Changes in Migration Rate From the Lengths of Migrant Tracts , 2009, Genetics.

[9]  Pedro C. Avila,et al.  Fast and accurate inference of local ancestry in Latino populations , 2012, Bioinform..

[10]  Jukka Corander,et al.  Approximate Bayesian Computation , 2013, PLoS Comput. Biol..

[11]  Jake K. Byrnes,et al.  PCAdmix: Principal Components-Based Assignment of Ancestry Along Each Chromosome in Individuals with Admixed Ancestry from Two or More Populations , 2012, Human biology.

[12]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[13]  Robert Brown,et al.  Enhanced Methods for Local Ancestry Assignment in Sequenced Admixed Individuals , 2014, PLoS Comput. Biol..

[14]  Christopher I. Amos,et al.  Principal Components Analysis of Population Admixture , 2012, PloS one.

[15]  Christopher I. Amos,et al.  Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification , 2010, PloS one.

[16]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[17]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[18]  Nouna Kettaneh,et al.  Statistical Modeling by Wavelets , 1999, Technometrics.

[19]  August E. Woerner,et al.  Intergenic DNA sequences from the human X chromosome reveal high rates of global gene flow , 2008, BMC Genetics.

[20]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[21]  P. Bellwood,et al.  Prehistory of the Indo-Malaysian Archipelago , 1985 .

[22]  Shuhua Xu,et al.  Genetic dating indicates that the Asian–Papuan admixture through Eastern Indonesia corresponds to the Austronesian expansion , 2012, Proceedings of the National Academy of Sciences.

[23]  Stephen L. Hauser,et al.  Genome-wide patterns of population structure and admixture in West Africans and African Americans , 2009, Proceedings of the National Academy of Sciences.

[24]  G. McVean A Genealogical Interpretation of Principal Components Analysis , 2009, PLoS genetics.

[25]  Joseph K. Pickrell,et al.  Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium , 2012, Genetics.

[26]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[27]  Donald P. Percival,et al.  On estimation of the wavelet variance , 1995 .

[28]  Pietro Liò,et al.  Wavelets in bioinformatics and computational biology: state of art and perspectives , 2003, Bioinform..

[29]  M. Stoneking,et al.  Dating the age of admixture via wavelet transform analysis of genome-wide data , 2011, Genome Biology.

[30]  Gary K. Chen,et al.  Fast and flexible simulation of DNA sequence data. , 2008, Genome research.

[31]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[32]  S. Gravel Population Genetics Models of Local Ancestry , 2012, Genetics.

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  M. Gutmann,et al.  Approximate Bayesian Computation , 2012 .

[35]  D. Reich,et al.  Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations , 2009, PLoS genetics.

[36]  Michael Frazier Wavelets on ℤ , 2000 .

[37]  F. Balloux,et al.  Discriminant analysis of principal components: a new method for the analysis of genetically structured populations , 2010, BMC Genetics.

[38]  D. Falush,et al.  A Genetic Atlas of Human Admixture History , 2014, Science.

[39]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.