PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs

UNLABELLED The analysis of genetic data often requires a combination of several approaches using different and sometimes incompatible programs. In order to facilitate data exchange and file conversions between population genetics programs, we introduce PGDSpider, a Java program that can read 27 different file formats and export data into 29, partially overlapping, other file formats. The PGDSpider package includes both an intuitive graphical user interface and a command-line version allowing its integration in complex data analysis pipelines. AVAILABILITY PGDSpider is freely available under the BSD 3-Clause license on http://cmpg.unibe.ch/software/PGDSpider/.

[1]  Oscar Gaggiotti,et al.  Identifying the Environmental Factors That Determine the Genetic Structure of Populations , 2006, Genetics.

[2]  Jukka Corander,et al.  Identifying Currents in the Gene Pool for Bacterial Populations Using an Integrative Approach , 2009, PLoS Comput. Biol..

[3]  E. Thompson,et al.  A model-based method for identifying species hybrids using multilocus genetic data. , 2002, Genetics.

[4]  Arnaud Estoup,et al.  Analysing georeferenced population genetics data with Geneland: a new algorithm to deal with null alleles and a friendly graphical user interface , 2008, Bioinform..

[5]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[6]  Z. Gompert,et al.  A Hierarchical Bayesian Model for Next-Generation Population Genomics , 2011, Genetics.

[7]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[8]  Arnaud Estoup,et al.  A Spatial Statistical Model for Landscape Genetics , 2005, Genetics.

[9]  L. Excoffier,et al.  Computer programs for population genetics data analysis: a survival guide , 2006, Nature Reviews Genetics.

[10]  Gilles Guillot,et al.  Inference of structure in subdivided populations at low levels of genetic differentiation - the correlated allele frequencies model revisited , 2008, Bioinform..

[11]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[12]  Bruce Rannala,et al.  Bayesian inference of recent migration rates using multilocus genotypes. , 2003, Genetics.

[13]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[14]  Rasmus Nielsen,et al.  DISCERNING BETWEEN RECURRENT GENE FLOW AND RECENT DIVERGENCE UNDER A FINITE‐SITE MUTATION MODEL APPLIED TO NORTH ATLANTIC AND MEDITERRANEAN SEA FIN WHALE (BALAENOPTERA PHYSALUS) POPULATIONS , 2004, Evolution; international journal of organic evolution.

[15]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[16]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[17]  B. Rannala,et al.  Detecting immigration by using multilocus genotypes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[18]  David J. Balding,et al.  Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities , 2003 .

[19]  F. Bonhomme,et al.  GENETIX 4.05, logiciel sous Windows TM pour la génétique des populations. , 1996 .

[20]  F. Rousset genepop’007: a complete re‐implementation of the genepop software for Windows and Linux , 2008, Molecular ecology resources.

[21]  J. Glaubitz convert: A user‐friendly program to reformat diploid genotypic data for commonly used population genetic software packages , 2004 .

[22]  G. Guillot,et al.  Using AFLP markers and the Geneland program for the inference of population genetic structure , 2010, Molecular ecology resources.

[23]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[24]  L. Excoffier,et al.  Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows , 2010, Molecular ecology resources.

[25]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[26]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[27]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[28]  M. Stephens,et al.  fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets , 2014, Genetics.

[29]  Matthieu Foll,et al.  Enhanced AFLP genome scans detect local adaptation in high‐altitude populations of a small rodent (Microtus arvalis) , 2011, Molecular ecology.

[30]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[31]  Gilles Guillot,et al.  A computer program to simulate multilocus genotype data with spatially autocorrelated allele frequencies , 2009, Molecular ecology resources.

[32]  M. Kimura The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. , 1969, Genetics.

[33]  H. Zuse References to Literature , 1998 .

[34]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[35]  P. Beerli How to use MIGRATE or why are Markov chain Monte Carlo programs difficult to use , 2009 .

[36]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[37]  C. Schlötterer,et al.  microsatellite analyser (MSA): a platform independent analysis tool for large microsatellite data sets , 2003 .

[38]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[39]  John P. Huelsenbeck,et al.  Structurama: Bayesian Inference of Population Structure , 2011, Evolutionary bioinformatics online.

[40]  L. Excoffier,et al.  Estimating population structure from AFLP amplification intensity , 2010, Molecular ecology.

[41]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[42]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[44]  Arnaud Estoup,et al.  Geneland: a computer package for landscape genetics , 2005 .

[45]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[46]  J. Fordyce,et al.  Bayesian analysis of molecular variance in pyrosequences quantifies population genetic structure across the genome of Lycaeides butterflies , 2010, Molecular ecology.

[47]  G. Coop,et al.  Robust Identification of Local Adaptation from Allele Frequencies , 2012, Genetics.

[48]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[49]  David B. Witonsky,et al.  Using Environmental Correlations to Identify Loci Underlying Local Adaptation , 2010, Genetics.

[50]  T. Ohta,et al.  Stepwise mutation model and distribution of allelic frequencies in a finite population. , 1978, Proceedings of the National Academy of Sciences of the United States of America.

[51]  François Rousset,et al.  GENEPOP (version 1.2): population genetic software for exact tests and ecumenicism , 1995 .

[52]  J. Hey The divergence of chimpanzee species and subspecies as revealed in multipopulation isolation-with-migration analyses. , 2010, Molecular biology and evolution.

[53]  L. Excoffier,et al.  Minisatellite mutational processes reduce Fst estimates , 1999, Human Genetics.

[54]  M. Beaumont Detecting population expansion and decline using microsatellites. , 1999, Genetics.

[55]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[56]  J. Felsenstein,et al.  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. , 1999, Genetics.

[57]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[58]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[59]  T. Creighton Methods in Enzymology , 1968, The Yale Journal of Biology and Medicine.

[60]  M. Stephens,et al.  Inferring weak population structure with the assistance of sample group information , 2009, Molecular ecology resources.

[61]  B. Letcher,et al.  create: a software to create input files from diploid genotypic data for 52 genetic software programs , 2008, Molecular ecology resources.

[62]  Nicholas Stiffler,et al.  Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags , 2010, PLoS genetics.

[63]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[64]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[65]  FORMATOMATIC: a program for converting diploid allelic data between common formats for population genetic analysis. , 2007, Molecular ecology notes.

[66]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[67]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.