论文信息 - Structurama: Bayesian Inference of Population Structure - 字舞流文

Structurama: Bayesian Inference of Population Structure

Structurama is a program for inferring population structure. Specifically, the program calculates the posterior probability of assigning individuals to different populations. The program takes as input a file containing the allelic information at some number of loci sampled from a collection of individuals. After reading a data file into computer memory, Structurama uses a Gibbs algorithm to sample assignments of individuals to populations. The program implements four different models: The number of populations can be considered fixed or a random variable with a Dirichlet process prior; moreover, the genotypes of the individuals in the analysis can be considered to come from a single population (no admixture) or as coming from several different populations (admixture). The output is a file of partitions of individuals to populations that were sampled by the Markov chain Monte Carlo algorithm. The partitions are sampled in proportion to their posterior probabilities. The program implements a number of ways to summarize the sampled partitions, including calculation of the ‘mean’ partition—a partition of the individuals to populations that minimizes the squared distance to the sampled partitions.

John P. Huelsenbeck | Peter Andolfatto | Edna T. Huelsenbeck | J. Huelsenbeck | P. Andolfatto

[1] W. Ewens,et al. The transmission/disequilibrium test: history, subdivision, and admixture. , 1995, American journal of human genetics.

[2] C. Antoniak. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[3] J. Huelsenbeck,et al. Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[4] M. Sillanpää,et al. Bayesian analysis of genetic differentiation between populations. , 2003, Genetics.

[5] J. Pitman. Combinatorial Stochastic Processes , 2006 .

[6] R. Nielsen. Statistical tests of selective neutrality in the age of genomics , 2001, Heredity.

[7] M. Newton,et al. Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .

[8] D. Maddison,et al. NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[9] École d'été de probabilités de Saint-Flour,et al. École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[10] D. White,et al. Constructive combinatorics , 1986 .

[11] Dan Gusfield,et al. Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[12] M. Stephens,et al. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[13] Radford M. Neal. Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[14] P. Donnelly,et al. Inference of population structure using multilocus genotype data. , 2000, Genetics.

[15] E. Lorenzen,et al. No suggestion of hybridization between the vulnerable black‐faced impala (Aepyceros melampus petersi) and the common impala (A. m. melampus) in Etosha National Park, Namibia , 2004, Molecular ecology.

[16] Jukka Corander,et al. BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[17] J. Pella,et al. The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines , 2006 .

[18] Dipak K Dey,et al. A Bayesian approach to inferring population structure from dominant markers , 2002, Molecular ecology.

[19] T. Ferguson. A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[20] G. Evanno,et al. Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.

[21] P. Arctander,et al. Regional genetic structuring and evolutionary history of the impala Aepyceros melampus. , 2006, The Journal of heredity.

[22] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .