The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines

Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and split–merge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the Hardy–Weinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative meas...

[1]  Kenneth C. W. Kammeyer,et al.  An introduction to population , 1974 .

[2]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[3]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[4]  R. H. Richardson,et al.  The detection of sympatric sibling species using genetic correlation analysis. I. Two loci, two gamodemes. , 1977, Genetics.

[5]  T. Beacham,et al.  Estimating Stock Composition in Mixed Stock Fisheries Using Morphometric, Meristic, and Electrophoretic Characteristics , 1984 .

[6]  R. Millar Maximum Likelihood Estimation of Mixed Stock Fishery Composition , 1987 .

[7]  Joseph A. Tworek,et al.  A Genetic Mixture Analysis for use with Incomplete Source Population Data , 1990 .

[8]  M. West,et al.  A Bayesian method for classification and discrimination , 1992 .

[9]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[10]  Roderic D. M. Page,et al.  TreeView: an application to display phylogenetic trees on personal computers , 1996, Comput. Appl. Biosci..

[11]  Adrian E. Raftery,et al.  Inference in model-based cluster analysis , 1997, Stat. Comput..

[12]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[13]  Kenneth Lange,et al.  Mathematical and Statistical Methods for Genetic Analysis , 1997 .

[14]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[15]  B. Rannala,et al.  Detecting immigration by using multilocus genotypes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[16]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[17]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[18]  T. Beacham,et al.  Application of microsatellite DNA variation to estimation of stock composition and escapement of Nass River sockeye salmon (Oncorhynchus nerka). , 1999 .

[19]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[20]  M. Masuda,et al.  SPAM (version 3.2): statistics program for analyzing mixtures. , 2000, The Journal of heredity.

[21]  M. Stephens Dealing with label switching in mixture models , 2000 .

[22]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[23]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[24]  A Vignal,et al.  Empirical evaluation of genetic clustering methods using multilocus genotypes from 20 chicken breeds. , 2001, Genetics.

[25]  K J Dawson,et al.  A Bayesian approach to the identification of panmictic populations and the assignment of individuals. , 2001, Genetical research.

[26]  E. Thompson,et al.  A model-based method for identifying species hybrids using multilocus genetic data. , 2002, Genetics.

[27]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[28]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[29]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[30]  J. Nielsen,et al.  A Comparison of Genetic Variation Between an Anadromous Steelhead, Oncorhynchus mykiss, Population and Seven Derived Populations Sequestered in Freshwater for 70 Years , 2004, Environmental Biology of Fishes.

[31]  J. Pella,et al.  Classical individual assignments versus mixture modeling to estimate stock proportions in Atlantic salmon (Salmo salar) catches from DNA microsatellite data , 2005 .

[32]  T. Quinn,et al.  An empirical verification of population assignment methods by marking and parentage data: hatchery and wild steelhead (Oncorhynchus mykiss) in Forks Creek, Washington, USA , 2006, Molecular ecology.

[33]  Noah A. Rosenberg,et al.  CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure , 2007, Bioinform..