Ancestry Inference in Complex Admixtures via Variable-Length Markov Chain Linkage Models

Inferring the ancestral origin of chromosomal segments in admixed individuals is key for genetic applications, ranging from analyzing population demographics and history, to mapping disease genes. Previous methods addressed ancestry inference by using either weak models of linkage disequilibrium, or large models that make explicit use of ancestral haplotypes. In this paper we introduce ALLOY, an efficient method that incorporates generalized, but highly expressive, linkage disequilibrium models. ALLOY applies a factorial hidden Markov model to capture the parallel process producing the maternal and paternal admixed haplotypes, and models the background linkage disequilibrium in the ancestral populations via an inhomogeneous variable-length Markov chain. We test ALLOY in a broad range of scenarios ranging from recent to ancient admixtures with up to four ancestral populations. We show that ALLOY outperforms the previous state of the art, and is robust to uncertainties in model parameters.

[1]  C. Winkler,et al.  Admixture mapping comes of age. , 2010, Annual review of genomics and human genetics.

[2]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[3]  Gary K. Chen,et al.  Enhanced Statistical Tests for GWAS in Admixed Populations: Assessment using African Americans from CARe and a Breast Cancer Consortium , 2011, PLoS genetics.

[4]  J. Long The genetic structure of admixed populations. , 1991, Genetics.

[5]  M. Daly,et al.  Methods for high-density admixture mapping of disease genes. , 2004, American journal of human genetics.

[6]  A. Price,et al.  New approaches to disease mapping in admixed populations , 2011, Nature Reviews Genetics.

[7]  Chuong B. Do,et al.  Effect of genetic divergence in identifying ancestral origin using HAPAA. , 2008, Genome research.

[8]  D. Reich,et al.  Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations , 2009, PLoS genetics.

[9]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[10]  N. Risch,et al.  Reconstructing genetic ancestry blocks in admixed individuals. , 2006, American journal of human genetics.

[11]  D. Ballinger,et al.  A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. , 2006, American journal of human genetics.

[12]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[13]  R. Nielsen,et al.  Inference of Historical Changes in Migration Rate From the Lengths of Migrant Tracts , 2009, Genetics.

[14]  Zachary A. Szpiech,et al.  Genotype, haplotype and copy-number variation in worldwide human populations , 2008, Nature.

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Jake K. Byrnes,et al.  Genomic Ancestry of North Africans Supports Back-to-Africa Migrations , 2012, PLoS genetics.

[17]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[18]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[19]  Xiaofeng Zhu,et al.  The landscape of recombination in African Americans , 2011, Nature.

[20]  E. Halperin,et al.  Estimating Local Ancestry in Admixed Populations , 2022 .

[21]  R. Wilke,et al.  Mapping genes that predict treatment outcome in admixed populations , 2010, The Pharmacogenomics Journal.

[22]  Eran Halperin,et al.  Inference of locus-specific ancestry in closely related populations , 2009, Bioinform..

[23]  J. B. S. Haldane,et al.  The probable errors of calculated linkage values, and the most accurate method of determining gametic from certain zygotic series , 1919, Journal of Genetics.

[24]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[25]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[26]  Dan Geiger,et al.  Inferring Ancestries Efficiently in Admixed Populations with Linkage Disequilibrium , 2022 .