PopGenReport: simplifying basic population genetic analyses in R

Summary 1. Using scripting languages such as R to perform population genetic analyses can improve the reproducibility of research, but using R can be challenging for many researchers due to its steep learning curve. 2. POPGENREPORT is a new R package that simplifies performing population genetics analyses in R, through the use of a new report-generating function. The function POPGENREPORT allows users to perform up to 13 pre-defined and 1 user-defined analyses through the use of a single command line. Each analysis generates figures and tables that are incorporated into a PDF report and are also made available as individual files (figures are provided in multiple formats, table contents are provided as CSV files). 3. The package includes new R functions that simplify the importation of data from a spreadsheet file, examine allele distributions across populations and loci and identify private alleles, determine pairwise individual genetic distances using the methods of Smouse and Peakall (1999) and Kosman and Leonard (2005), respectively, detect the presence of null alleles, calculate allelic richness, and test for spatial autocorrelation in genotypes using the methods of Smouse and Peakall (1999). 4. The package has a modular structure that makes the process of adding new functionality straightforward. To facilitate the addition of user-designed functions, the package includes a fully customizable module that can be automatically included in the PDF report. 5. To support users not experienced in R, the website (www.popgenreport.org) has a tutorial for the package and a downloadable, portable version of the package with LaTeX pre-configured for the Windows operating system.

[1]  J. Goudet,et al.  Tests for sex‐biased dispersal using bi‐parentally inherited genetic markers , 2002, Molecular ecology.

[2]  N. J. Ouborg,et al.  Population genetics, molecular markers and the study of dispersal in plants , 1999 .

[3]  J. Brookfield A simple new method for estimating null allele frequency from heterozygote deficiency. , 1996 .

[4]  Robert J Toonen,et al.  Microsatellites for ecologists: a practical guide to using and evaluating microsatellite markers. , 2006, Ecology letters.

[5]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[6]  L. Waits,et al.  Estimating the probability of identity among genotypes in natural populations: cautions and guidelines , 2001, Molecular ecology.

[7]  R. Petit,et al.  High level of genetic differentiation for allelic richness among populations of the argan tree [Argania spinosa (L.) Skeels] endemic to Morocco , 1996, Theoretical and Applied Genetics.

[8]  J. Strassmann,et al.  Microsatellites and kinship. , 1993, Trends in ecology & evolution.

[9]  J. Goudet FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics , 1995 .

[10]  D. Hartl,et al.  Principles of population genetics , 1981 .

[11]  B. Rannala,et al.  Detecting immigration by using multilocus genotypes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[12]  L. Excoffier,et al.  Computer programs for population genetics data analysis: a survival guide , 2006, Nature Reviews Genetics.

[13]  Emmanuel Paradis,et al.  pegas: an R package for population genetics with an integrated-modular approach , 2010, Bioinform..

[14]  R. Petit,et al.  Current trends in microsatellite genotyping , 2011, Molecular ecology resources.

[15]  S. Daiger,et al.  Apparent heterozygote deficiencies observed in DNA typing data and their implications in forensic applications , 1992, Annals of human genetics.

[16]  D. Winter mmod: an R library for the calculation of population differentiation statistics , 2012, Molecular ecology resources.

[17]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[18]  Thibaut Jombart,et al.  adegenet 1.3-1: new tools for the analysis of genome-wide SNP data , 2011, Bioinform..

[19]  A. Jones,et al.  Methods of parentage analysis in natural populations , 2003, Molecular ecology.

[20]  Thibaut Jombart,et al.  adegenet: a R package for the multivariate analysis of genetic markers , 2008, Bioinform..

[21]  K. Leonard,et al.  Similarity coefficients for molecular markers in studies of genetic relationships between individuals for haploid, diploid, and polyploid species , 2005, Molecular ecology.

[22]  S. Blanchet The use of molecular tools in invasion biology: an emphasis on freshwater ecosystems , 2012 .

[23]  J. Goudet HIERFSTAT , a package for R to compute and test hierarchical F -statistics , 2005 .

[24]  François Rousset,et al.  GENEPOP (version 1.2): population genetic software for exact tests and ecumenicism , 1995 .

[25]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[26]  SPATIAL AUTOCORRELATION ANALYSIS OFFERS NEW INSIGHTS INTO GENE FLOW IN THE AUSTRALIAN BUSH RAT, RATTUS FUSCIPES , 2003, Evolution; international journal of organic evolution.

[27]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[28]  J. Goudet FSTAT, a program to estimate and test gene diversities and fixation indices (version 2.9.3). Updated from Goudet (1995) , 2001 .

[29]  Rod Peakall,et al.  GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update , 2012, Bioinform..

[30]  Rod Peakall,et al.  Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure , 1999, Heredity.