Structurama: Bayesian Inference of Population Structure

Structurama is a program for inferring population structure. Specifically, the program calculates the posterior probability of assigning individuals to different populations. The program takes as input a file containing the allelic information at some number of loci sampled from a collection of individuals. After reading a data file into computer memory, Structurama uses a Gibbs algorithm to sample assignments of individuals to populations. The program implements four different models: The number of populations can be considered fixed or a random variable with a Dirichlet process prior; moreover, the genotypes of the individuals in the analysis can be considered to come from a single population (no admixture) or as coming from several different populations (admixture). The output is a file of partitions of individuals to populations that were sampled by the Markov chain Monte Carlo algorithm. The partitions are sampled in proportion to their posterior probabilities. The program implements a number of ways to summarize the sampled partitions, including calculation of the ‘mean’ partition—a partition of the individuals to populations that minimizes the squared distance to the sampled partitions.

[1]  W. Ewens,et al.  The transmission/disequilibrium test: history, subdivision, and admixture. , 1995, American journal of human genetics.

[2]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[3]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[4]  M. Sillanpää,et al.  Bayesian analysis of genetic differentiation between populations. , 2003, Genetics.

[5]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[6]  R. Nielsen Statistical tests of selective neutrality in the age of genomics , 2001, Heredity.

[7]  M. Newton,et al.  Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .

[8]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[9]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[10]  D. White,et al.  Constructive combinatorics , 1986 .

[11]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[12]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[13]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[14]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[15]  E. Lorenzen,et al.  No suggestion of hybridization between the vulnerable black‐faced impala (Aepyceros melampus petersi) and the common impala (A. m. melampus) in Etosha National Park, Namibia , 2004, Molecular ecology.

[16]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[17]  J. Pella,et al.  The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines , 2006 .

[18]  Dipak K Dey,et al.  A Bayesian approach to inferring population structure from dominant markers , 2002, Molecular ecology.

[19]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[20]  G. Evanno,et al.  Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.

[21]  P. Arctander,et al.  Regional genetic structuring and evolutionary history of the impala Aepyceros melampus. , 2006, The Journal of heredity.

[22]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .