PRESTO: Rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies

BackgroundLarge-scale genetic association studies can test hundreds of thousands of genetic markers for association with a trait. Since the genetic markers may be correlated, a Bonferroni correction is typically too stringent a correction for multiple testing. Permutation testing is a standard statistical technique for determining statistical significance when performing multiple correlated tests for genetic association. However, permutation testing for large-scale genetic association studies is computationally demanding and calls for optimized algorithms and software. PRESTO is a new software package for genetic association studies that performs fast computation of multiple-testing adjusted P-values via permutation of the trait.ResultsPRESTO is an order of magnitude faster than other existing permutation testing software, and can analyze a large genome-wide association study (500 K markers, 5 K individuals, 1 K permutations) in approximately one hour of computing time. PRESTO has several unique features that are useful in a wide range of studies: it reports empirical null distributions for the top-ranked statistics (i.e. order statistics), it performs user-specified combinations of allelic and genotypic tests, it performs stratified analysis when sampled individuals are from multiple populations and each individual's population of origin is specified, and it determines significance levels for one and two-stage genotyping designs. PRESTO is designed for case-control studies, but can also be applied to trio data (parents and affected offspring) if transmitted parental alleles are coded as case alleles and untransmitted parental alleles are coded as control alleles.ConclusionPRESTO is a platform-independent software package that performs fast and flexible permutation testing for genetic association studies. The PRESTO executable file, Java source code, example data, and documentation are freely available at http://www.stat.auckland.ac.nz/~browning/presto/presto.html.

[1]  J. Chimka Categorical Data Analysis, Second Edition , 2003 .

[2]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[3]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[4]  J. Besag,et al.  Sequential Monte Carlo p-values , 1991 .

[5]  Nathan Mantel,et al.  Chi-square tests with one degree of freedom , 1963 .

[6]  Frank Dudbridge A note on permutation tests in multistage association scans. , 2006, American journal of human genetics.

[7]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[8]  Frank Dudbridge,et al.  Rank truncated product of P‐values, with application to genomewide association scans , 2003, Genetic epidemiology.

[9]  M. Boehnke,et al.  So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. , 2007, American journal of human genetics.

[10]  Michael I. Jordan,et al.  A randomization test for controlling population stratification in whole-genome association studies. , 2007, American journal of human genetics.

[11]  R. Shamir,et al.  A fast method for computing high-significance disease association in large population-based studies. , 2006, American journal of human genetics.

[12]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[13]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[14]  B S Weir,et al.  Truncated product method for combining P‐values , 2002, Genetic epidemiology.

[15]  A. Agresti Categorical data analysis , 1993 .