Reconstructing genetic ancestry blocks in admixed individuals.

A chromosome in an individual of recently admixed ancestry resembles a mosaic of chromosomal segments, or ancestry blocks, each derived from a particular ancestral population. We consider the problem of inferring ancestry along the chromosomes in an admixed individual and thereby delineating the ancestry blocks. Using a simple population model, we infer gene-flow history in each individual. Compared with existing methods, which are based on a hidden Markov model, the Markov-hidden Markov model (MHMM) we propose has the advantage of accounting for the background linkage disequilibrium (LD) that exists in ancestral populations. When there are more than two ancestral groups, we allow each ancestral population to admix at a different time in history. We use simulations to illustrate the accuracy of the inferred ancestry as well as the importance of modeling the background LD; not accounting for background LD between markers may mislead us to false inferences about mixed ancestry in an indigenous population. The MHMM makes it possible to identify genomic blocks of a particular ancestry by use of any high-density single-nucleotide-polymorphism panel. One application of our method is to perform admixture mapping without genotyping special ancestry-informative-marker panels.

[1]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[2]  D. Reich,et al.  Will admixture mapping work to find disease genes? , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  N. Risch,et al.  Estimation of individual admixture: Analytical and study design considerations , 2005, Genetic epidemiology.

[4]  N. Risch,et al.  Admixture mapping for hypertension loci with genome-scan markers , 2005, Nature Genetics.

[5]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[6]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[7]  Giovanni Montana,et al.  Statistical tests for admixture mapping with case-control and cases-only data. , 2004, American journal of human genetics.

[8]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[9]  Hongzhe Li,et al.  Putative ancestral origins of chromosomal segments in individual african americans: implications for admixture mapping. , 2004, Genome research.

[10]  Xiaofeng Zhu,et al.  Linkage analysis of a complex disease through use of admixed populations. , 2004, American journal of human genetics.

[11]  M. Daly,et al.  Methods for high-density admixture mapping of disease genes. , 2004, American journal of human genetics.

[12]  C. Hoggart,et al.  Design and analysis of admixture mapping studies. , 2004, American journal of human genetics.

[13]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[14]  Kei-Hoi Cheung,et al.  ALFRED: the ALelle FREquency Database. Update , 2003, Nucleic Acids Res..

[15]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[16]  Andrew G Clark,et al.  Linkage disequilibrium and the mapping of complex human traits. , 2002, Trends in genetics : TIG.

[17]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[18]  R. Elston,et al.  Mulitpoint admixture mapping , 2000 .

[19]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[20]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[21]  L Sun,et al.  Statistical tests for detection of misspecified relationships by use of genome-screen data. , 2000, American journal of human genetics.

[22]  P. McKeigue Multipoint admixture mapping. , 2000, Genetic Epidemiology.

[23]  P. McKeigue,et al.  Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. , 1998, American journal of human genetics.

[24]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[25]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[26]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[27]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[28]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[29]  J. Long The genetic structure of admixed populations. , 1991, Genetics.

[30]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[32]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[35]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[36]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[37]  D. Rife Populations of hybrid origin as source material for the detection of linkage. , 1954, American journal of human genetics.