Computer programs for population genetics data analysis: a survival guide

The analysis of genetic diversity within species is vital for understanding evolutionary processes at the population level and at the genomic level. A large quantity of data can now be produced at an unprecedented rate, requiring the use of dedicated computer programs to extract all embedded information. Several statistical packages have been recently developed, which offer a panel of standard and more sophisticated analyses. We describe here the functionalities, special features and assumptions of more than 20 such programs, indicate how they can interoperate, and discuss new directions that could lead to improved software and analyses.

[1]  R. Lewontin,et al.  THE INTERACTION OF SELECTION AND LINKAGE. II. OPTIMUM MODELS. , 1964, Genetics.

[2]  J. Felsenstein,et al.  How can we infer geography and history from gene frequencies? , 1982, Journal of theoretical biology.

[3]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[4]  N. Freimer,et al.  Allele frequencies at microsatellite loci: the stepwise mutation model revisited. , 1993, Genetics.

[5]  M Slatkin,et al.  A measure of population subdivision based on microsatellite allele frequencies. , 1995, Genetics.

[6]  François Rousset,et al.  GENEPOP (version 1.2): population genetic software for exact tests and ecumenicism , 1995 .

[7]  J. Goudet FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics , 1995 .

[8]  M W Feldman,et al.  An evaluation of genetic distances for use with microsatellite loci. , 1994, Genetics.

[9]  F. Rousset,et al.  AN EXACT TEST FOR POPULATION DIFFERENTIATION , 1995, Evolution; international journal of organic evolution.

[10]  J. Wakeley,et al.  Distinguishing migration from isolation using the variance of pairwise differences. , 1996, Theoretical population biology.

[11]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[12]  M. Beaumont Detecting population expansion and decline using microsatellites. , 1999, Genetics.

[13]  J. Labate Software for Population Genetic Analyses of Molecular Marker Data , 2000 .

[14]  LIKELIHOOD ANALYSIS OF ONGOING GENE FLOW AND HISTORICAL ASSOCIATION , 2000, Evolution; international journal of organic evolution.

[15]  François Balloux,et al.  MICROSATELLITES CAN BE MISLEADING: AN EMPIRICAL AND SIMULATION STUDY , 2000, Evolution; international journal of organic evolution.

[16]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[17]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[18]  O. Hardy,et al.  spagedi: a versatile computer program to analyse spatial genetic structure at the individual or population levels , 2002 .

[19]  O. Gaggiotti,et al.  Patterns of colonization in a metapopulation of grey seals , 2002, Nature.

[20]  Christopher Gignoux,et al.  SNPSTRs: empirically derived, rapidly typed, autosomal haplotypes for inference of population history and mutational processes. , 2002, Genome research.

[21]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[22]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[23]  E. Thompson,et al.  A model-based method for identifying species hybrids using multilocus genetic data. , 2002, Genetics.

[24]  Neil J. Anderson,et al.  Assessing population structure and gene flow in Montana wolverines (Gulo gulo) using assignment‐based approaches , 2003, Molecular ecology.

[25]  David J. Balding,et al.  Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities , 2003 .

[26]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[27]  Bruce Rannala,et al.  Bayesian inference of recent migration rates using multilocus genotypes. , 2003, Genetics.

[28]  A. Jones,et al.  Methods of parentage analysis in natural populations , 2003, Molecular ecology.

[29]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  C. Schlötterer,et al.  microsatellite analyser (MSA): a platform independent analysis tool for large microsatellite data sets , 2003 .

[31]  Xavier Messeguer,et al.  DnaSP, DNA polymorphism analyses by the coalescent and other methods , 2003, Bioinform..

[32]  F. Dudbridge A survey of current software for linkage analysis , 2003, Human Genomics.

[33]  D. Balding,et al.  Identifying adaptive genetic divergence among populations from genome scans , 2004, Molecular ecology.

[34]  Deborah A Nickerson,et al.  Population History and Natural Selection Shape Patterns of Genetic Variation in 132 Genes , 2004, PLoS biology.

[35]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[36]  O. Gaggiotti,et al.  Combining demographic, environmental and genetic data to test hypotheses about colonization events in metapopulations , 2004, Molecular ecology.

[37]  P. Beerli,et al.  Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations , 2004, Molecular ecology.

[38]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[39]  J. Glaubitz convert: A user‐friendly program to reformat diploid genotypic data for commonly used population genetic software packages , 2004 .

[40]  Sudhir Kumar,et al.  MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment , 2004, Briefings Bioinform..

[41]  L. Knowles,et al.  The burgeoning field of statistical phylogeography , 2003, Journal of evolutionary biology.

[42]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[43]  J. Cornuet,et al.  GENECLASS2: a software for genetic assignment and first-generation migrant detection. , 2004, The Journal of heredity.

[44]  R. Nielsen Population genetic analysis of ascertained SNP data , 2004, Human Genomics.

[45]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[46]  J. Goudet HIERFSTAT , a package for R to compute and test hierarchical F -statistics , 2005 .

[47]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[48]  Stefan Schneider,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005 .

[49]  Arnaud Estoup,et al.  Geneland: a computer package for landscape genetics , 2005 .

[50]  Qihua Tan,et al.  Integrated analysis of genetic data with R , 2006, Human Genomics.

[51]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[52]  O. Gaggiotti,et al.  colonise: a computer program to study colonization processes in metapopulations , 2005 .

[53]  D. Morrison,et al.  Networks in phylogenetic analysis: new tools for population biology. , 2005, International journal for parasitology.

[54]  Jean-Marie Cornuet,et al.  Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers , 2005, Genetics.

[55]  M. Slatkin Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations , 2004, Molecular ecology.

[56]  Ignazio Carbone,et al.  SNAP: workbench management tool for evolutionary population genetic analysis , 2005, Bioinform..

[57]  Ryan D. Hernandez,et al.  Simultaneous inference of selection and population growth from patterns of variation in the human genome , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Arnaud Estoup,et al.  A Spatial Statistical Model for Landscape Genetics , 2005, Genetics.

[59]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[60]  G. Evanno,et al.  Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.

[61]  R. Pong-Wong,et al.  Efficiency of the Use of Pedigree and Molecular Marker Information in Conservation Programs , 2005, Genetics.

[62]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[63]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[64]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.